aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-04-02VSOCK: Handle changes to the VMCI context ID.Reilly Grant
The VMCI context ID of a virtual machine may change at any time. There is a VMCI event which signals this but datagrams may be processed before this is handled. It is therefore necessary to be flexible about the destination context ID of any datagrams received. (It can be assumed to be correct because it is provided by the hypervisor.) The context ID on existing sockets should be updated to reflect how the hypervisor is currently referring to the system. Signed-off-by: Reilly Grant <grantr@vmware.com> Acked-by: Andy King <acking@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02net IPv6 : Fix broken IPv6 routing table after loopback down-upBalakumaran Kannan
IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo) interface. After down-up, routes of other interface's IPv6 addresses through 'lo' are lost. IPv6 addresses assigned to all interfaces are routed through 'lo' for internal communication. Once 'lo' is down, those routing entries are removed from routing table. But those removed entries are not being re-created properly when 'lo' is brought up. So IPv6 addresses of other interfaces becomes unreachable from the same machine. Also this breaks communication with other machines because of NDISC packet processing failure. This patch fixes this issue by reading all interface's IPv6 addresses and adding them to IPv6 routing table while bringing up 'lo'. ==Testing== Before applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ After applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02cbq: incorrect processing of high limitsVasily Averin
currently cbq works incorrectly for limits > 10% real link bandwidth, and practically does not work for limits > 50% real link bandwidth. Below are results of experiments taken on 1 Gbit link In shaper | Actual Result -----------+--------------- 100M | 108 Mbps 200M | 244 Mbps 300M | 412 Mbps 500M | 893 Mbps This happen because of q->now changes incorrectly in cbq_dequeue(): when it is called before real end of packet transmitting, L2T is greater than real time delay, q_now gets an extra boost but never compensate it. To fix this problem we prevent change of q->now until its synchronization with real time. Signed-off-by: Vasily Averin <vvs@openvz.org> Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) sadb_msg prepared for IPSEC userspace forgets to initialize the satype field, fix from Nicolas Dichtel. 2) Fix mac80211 synchronization during station removal, from Johannes Berg. 3) Fix IPSEC sequence number notifications when they wrap, from Steffen Klassert. 4) Fix cfg80211 wdev tracing crashes when add_virtual_intf() returns an error pointer, from Johannes Berg. 5) In mac80211, don't call into the channel context code with the interface list mutex held. From Johannes Berg. 6) In mac80211, if we don't actually associate, do not restart the STA timer, otherwise we can crash. From Ben Greear. 7) Missing dma_mapping_error() check in e1000, ixgb, and e1000e. From Christoph Paasch. 8) Fix sja1000 driver defines to not conflict with SH port, from Marc Kleine-Budde. 9) Don't call il4965_rs_use_green with a NULL station, from Colin Ian King. 10) Suspend/Resume in the FEC driver fail because the buffer descriptors are not initialized at all the moments in which they should. Fix from Frank Li. 11) cpsw and davinci_emac drivers both use the wrong interface to restart a stopped TX queue. Use netif_wake_queue not netif_start_queue, the latter is for initialization/bringup not active management of the queue. From Mugunthan V N. 12) Fix regression in rate calculations done by psched_ratecfg_precompute(), missing u64 type promotion. From Sergey Popovich. 13) Fix length overflow in tg3 VPD parsing, from Kees Cook. 14) AOE driver fails to allocate enough headroom, resulting in crashes. Fix from Eric Dumazet. 15) RX overflow happens too quickly in sky2 driver because pause packet thresholds are not programmed correctly. From Mirko Lindner. 16) Bonding driver manages arp_interval and miimon settings incorrectly, disabling one unintentionally disables both. Fix from Nikolay Aleksandrov. 17) smsc75xx drivers don't program the RX mac properly for jumbo frames. Fix from Steve Glendinning. 18) Fix off-by-one in Codel packet scheduler. From Vijay Subramanian. 19) Fix packet corruption in atl1c by disabling MSI support, from Hannes Frederic Sowa. 20) netdev_rx_handler_unregister() needs a synchronize_net() to fix crashes in bonding driver unload stress tests. From Eric Dumazet. 21) rxlen field of ks8851 RX packet descriptors not interpreted correctly (it is 12 bits not 16 bits, so needs to be masked after shifting the 32-bit value down 16 bits). Fix from Max Nekludov. 22) Fix missed RX/TX enable in sh_eth driver due to mishandling of link change indications. From Sergei Shtylyov. 23) Fix crashes during spurious ECI interrupts in sh_eth driver, also from Sergei Shtylyov. 24) dm9000 driver initialization is done wrong for revision B devices with DSP PHY, from Joseph CHANG. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits) DM9000B: driver initialization upgrade sh_eth: make 'link' field of 'struct sh_eth_private' *int* sh_eth: workaround for spurious ECI interrupt sh_eth: fix handling of no LINK signal ks8851: Fix interpretation of rxlen field. net: add a synchronize_net() in netdev_rx_handler_unregister() MAINTAINERS: Update netxen_nic maintainers list atl1e: drop pci-msi support because of packet corruption net: fq_codel: Fix off-by-one error net: calxedaxgmac: Wake-on-LAN fixes net: calxedaxgmac: fix rx ring handling when OOM net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb' smsc75xx: fix jumbo frame support net: fix the use of this_cpu_ptr bonding: fix disabling of arp_interval and miimon ipv6: don't accept node local multicast traffic from the wire sky2: Threshold for Pause Packet is set wrong sky2: Receive Overflows not counted aoe: reserve enough headroom on skbs line up comment for ndo_bridge_getlink ...
2013-03-29net: add a synchronize_net() in netdev_rx_handler_unregister()Eric Dumazet
commit 35d48903e97819 (bonding: fix rx_handler locking) added a race in bonding driver, reported by Steven Rostedt who did a very good diagnosis : <quoting Steven> I'm currently debugging a crash in an old 3.0-rt kernel that one of our customers is seeing. The bug happens with a stress test that loads and unloads the bonding module in a loop (I don't know all the details as I'm not the one that is directly interacting with the customer). But the bug looks to be something that may still be present and possibly present in mainline too. It will just be much harder to trigger it in mainline. In -rt, interrupts are threads, and can schedule in and out just like any other thread. Note, mainline now supports interrupt threads so this may be easily reproducible in mainline as well. I don't have the ability to tell the customer to try mainline or other kernels, so my hands are somewhat tied to what I can do. But according to a core dump, I tracked down that the eth irq thread crashed in bond_handle_frame() here: slave = bond_slave_get_rcu(skb->dev); bond = slave->bond; <--- BUG the slave returned was NULL and accessing slave->bond caused a NULL pointer dereference. Looking at the code that unregisters the handler: void netdev_rx_handler_unregister(struct net_device *dev) { ASSERT_RTNL(); RCU_INIT_POINTER(dev->rx_handler, NULL); RCU_INIT_POINTER(dev->rx_handler_data, NULL); } Which is basically: dev->rx_handler = NULL; dev->rx_handler_data = NULL; And looking at __netif_receive_skb() we have: rx_handler = rcu_dereference(skb->dev->rx_handler); if (rx_handler) { if (pt_prev) { ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = NULL; } switch (rx_handler(&skb)) { My question to all of you is, what stops this interrupt from happening while the bonding module is unloading? What happens if the interrupt triggers and we have this: CPU0 CPU1 ---- ---- rx_handler = skb->dev->rx_handler netdev_rx_handler_unregister() { dev->rx_handler = NULL; dev->rx_handler_data = NULL; rx_handler() bond_handle_frame() { slave = skb->dev->rx_handler; bond = slave->bond; <-- NULL pointer dereference!!! What protection am I missing in the bond release handler that would prevent the above from happening? </quoting Steven> We can fix bug this in two ways. First is adding a test in bond_handle_frame() and others to check if rx_handler_data is NULL. A second way is adding a synchronize_net() in netdev_rx_handler_unregister() to make sure that a rcu protected reader has the guarantee to see a non NULL rx_handler_data. The second way is better as it avoids an extra test in fast path. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jpirko@redhat.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29net: fq_codel: Fix off-by-one errorVijay Subramanian
Currently, we hold a max of sch->limit -1 number of packets instead of sch->limit packets. Fix this off-by-one error. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb'Shmulik Ladkani
'nf_reset' is called just prior calling 'netif_rx'. No need to call it twice. Reported-by: Igor Michailov <rgohita@gmail.com> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29net: fix the use of this_cpu_ptrLi RongQing
flush_tasklet is not percpu var, and percpu is percpu var, and this_cpu_ptr(&info->cache->percpu->flush_tasklet) is not equal to &this_cpu_ptr(info->cache->percpu)->flush_tasklet 1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29ipv6: don't accept node local multicast traffic from the wireHannes Frederic Sowa
Erik Hugne's errata proposal (Errata ID: 3480) to RFC4291 has been verified: http://www.rfc-editor.org/errata_search.php?eid=3480 We have to check for pkt_type and loopback flag because either the packets are allowed to travel over the loopback interface (in which case pkt_type is PACKET_HOST and IFF_LOOPBACK flag is set) or they travel over a non-loopback interface back to us (in which case PACKET_TYPE is PACKET_LOOPBACK and IFF_LOOPBACK flag is not set). Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns fixes from Eric W Biederman: "The bulk of the changes are fixing the worst consequences of the user namespace design oversight in not considering what happens when one namespace starts off as a clone of another namespace, as happens with the mount namespace. The rest of the changes are just plain bug fixes. Many thanks to Andy Lutomirski for pointing out many of these issues." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Restrict when proc and sysfs can be mounted ipc: Restrict mounting the mqueue filesystem vfs: Carefully propogate mounts across user namespaces vfs: Add a mount flag to lock read only bind mounts userns: Don't allow creation if the user is chrooted yama: Better permission check for ptraceme pid: Handle the exit of a multi-threaded init. scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
2013-03-28Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-03-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) Initialize the satype field in key_notify_policy_flush(), this was left uninitialized. From Nicolas Dichtel. 2) The sequence number difference for replay notifications was misscalculated on ESN sequence number wrap. We need a separate replay notify function for esn. 3) Fix an off by one in the esn replay notify function. From Mathias Krause. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27sch: add missing u64 in psched_ratecfg_precompute()Sergey Popovich
It seems that commit commit 292f1c7ff6cc10516076ceeea45ed11833bb71c7 Author: Jiri Pirko <jiri@resnulli.us> Date: Tue Feb 12 00:12:03 2013 +0000 sch: make htb_rate_cfg and functions around that generic adds little regression. Before: # tc qdisc add dev eth0 root handle 1: htb default ffff # tc class add dev eth0 classid 1:ffff htb rate 5Gbit # tc -s class show dev eth0 class htb 1:ffff root prio 0 rate 5000Mbit ceil 5000Mbit burst 625b cburst 625b Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 lended: 0 borrowed: 0 giants: 0 tokens: 31 ctokens: 31 After: # tc qdisc add dev eth0 root handle 1: htb default ffff # tc class add dev eth0 classid 1:ffff htb rate 5Gbit # tc -s class show dev eth0 class htb 1:ffff root prio 0 rate 1544Mbit ceil 1544Mbit burst 625b cburst 625b Sent 5073 bytes 41 pkt (dropped 0, overlimits 0 requeues 0) rate 1976bit 2pps backlog 0b 0p requeues 0 lended: 41 borrowed: 0 giants: 0 tokens: 1802 ctokens: 1802 This probably due to lost u64 cast of rate parameter in psched_ratecfg_precompute() (net/sched/sch_generic.c). Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27rtnetlink: fix error return code in rtnl_link_fill()Wei Yongjun
Fix to return a negative error code from the error handling case instead of 0(possible overwrite to 0 by ops->fill_xstats call), as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Always increment IPV4 ID field in encapsulated GSO packets, even when DF is set. Regression fix from Pravin B Shelar. 2) Fix per-net subsystem initialization in netfilter conntrack, otherwise we may access dynamically allocated memory before it is actually allocated. From Gao Feng. 3) Fix DMA buffer lengths in iwl3945 driver, from Stanislaw Gruszka. 4) Fix race between submission of sync vs async commands in mwifiex driver, from Amitkumar Karwar. 5) Add missing cancel of command timer in mwifiex driver, from Bing Zhao. 6) Missing SKB free in rtlwifi USB driver, from Jussi Kivilinna. 7) Thermal layer tries to use a genetlink multicast string that is longer than the 16 character limit. Fix it and add a BUG check to prevent this kind of thing from happening in the future. From Masatake YAMATO. 8) Fix many bugs in the handling of the teardown of L2TP connections, UDP encapsulation instances, and sockets. From Tom Parkin. 9) Missing socket release in IRDA, from Kees Cook. 10) Fix fec driver modular build, from Fabio Estevam. 11) Erroneous use of kfree() instead of free_netdev() in lantiq_etop, from Wei Yongjun. 12) Fix bugs in handling of queue numbers and steering rules in mlx4 driver, from Moshe Lazer, Hadar Hen Zion, and Or Gerlitz. 13) Some FOO_DIAG_MAX constants were defined off by one, fix from Andrey Vagin. 14) TCP segmentation deferral is unintentionally done too strongly, breaking ACK clocking. Fix from Eric Dumazet. 15) net_enable_timestamp() can legitimately be invoked from software interrupts, and in a way that is safe, so remove the WARN_ON(). Also from Eric Dumazet. 16) Fix use after free in VLANs, from Cong Wang. 17) Fix TCP slow start retransmit storms after SACK reneging, from Yuchung Cheng. 18) Unix socket release should mark a socket dead before NULL'ing out sock->sk, otherwise we can race. Fix from Paul Moore. 19) IPV6 addrconf code can try to free static memory, from Hong Zhiguo. 20) Fix register mis-programming, NULL pointer derefs, and wrong PHC clock frequency in IGB driver. From Lior LevyAlex Williamson, Jiri Benc, and Jeff Kirsher. 21) skb->ip_summed logic in pch_gbe driver is reversed, breaking packet forwarding. Fix from Veaceslav Falico. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) ipv4: Fix ip-header identification for gso packets. bonding: remove already created master sysfs link on failure af_unix: dont send SCM_CREDENTIAL when dest socket is NULL pch_gbe: fix ip_summed checksum reporting on rx igb: fix PHC stopping on max freq igb: make sensor info static igb: SR-IOV init reordering igb: Fix null pointer dereference igb: fix i350 anti spoofing config ixgbevf: don't release the soft entries ipv6: fix bad free of addrconf_init_net unix: fix a race condition in unix_release() tcp: undo spurious timeout after SACK reneging bnx2x: fix assignment of signed expression to unsigned variable bridge: fix crash when set mac address of br interface 8021q: fix a potential use-after-free net: remove a WARN_ON() in net_enable_timestamp() tcp: preserve ACK clocking in TSO net: fix *_DIAG_MAX constants net/mlx4_core: Disallow releasing VF QPs which have steering rules ...
2013-03-26Merge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Fix an NFSv4 idmapper regression - Fix an Oops in the pNFS blocks client - Fix up various issues with pNFS layoutcommit - Ensure correct read ordering of variables in rpc_wake_up_task_queue_locked * tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked NFSv4.1: Add a helper pnfs_commit_and_return_layout NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn NFSv4.1: Fix a race in pNFS layoutcommit pnfs-block: removing DM device maybe cause oops when call dev_remove NFSv4: Fix the string length returned by the idmapper
2013-03-26ipv4: Fix ip-header identification for gso packets.Pravin B Shelar
ip-header id needs to be incremented even if IP_DF flag is set. This behaviour was changed in commit 490ab08127cebc25e3a26 (IP_GRE: Fix IP-Identification). Following patch fixes it so that identification is always incremented. Reported-by: Cong Wang <amwang@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26af_unix: dont send SCM_CREDENTIAL when dest socket is NULLdingtianhong
SCM_SCREDENTIALS should apply to write() syscalls only either source or destination socket asserted SOCK_PASSCRED. The original implememtation in maybe_add_creds is wrong, and breaks several LSB testcases ( i.e. /tset/LSB.os/netowkr/recvfrom/T.recvfrom). Origionally-authored-by: Karel Srot <ksrot@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2013-03-25ipv6: fix bad free of addrconf_init_netHong Zhiguo
Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25unix: fix a race condition in unix_release()Paul Moore
As reported by Jan, and others over the past few years, there is a race condition caused by unix_release setting the sock->sk pointer to NULL before properly marking the socket as dead/orphaned. This can cause a problem with the LSM hook security_unix_may_send() if there is another socket attempting to write to this partially released socket in between when sock->sk is set to NULL and it is marked as dead/orphaned. This patch fixes this by only setting sock->sk to NULL after the socket has been marked as dead; I also take the opportunity to make unix_release_sock() a void function as it only ever returned 0/success. Dave, I think this one should go on the -stable pile. Special thanks to Jan for coming up with a reproducer for this problem. Reported-by: Jan Stancek <jan.stancek@gmail.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_lockedTrond Myklebust
We need to be careful when testing task->tk_waitqueue in rpc_wake_up_task_queue_locked, because it can be changed while we are holding the queue->lock. By adding appropriate memory barriers, we can ensure that it is safe to test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-03-25xfrm: Fix esn sequence number diff calculation in xfrm_replay_notify_esn()Mathias Krause
Commit 0017c0b "xfrm: Fix replay notification for esn." is off by one for the sequence number wrapped case as UINT_MAX is 0xffffffff, not 0x100000000. ;) Just calculate the diff like done everywhere else in the file. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-03-24tcp: undo spurious timeout after SACK renegingYuchung Cheng
On SACK reneging the sender immediately retransmits and forces a timeout but disables Eifel (undo). If the (buggy) receiver does not drop any packet this can trigger a false slow-start retransmit storm driven by the ACKs of the original packets. This can be detected with undo and TCP timestamps. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24bridge: fix crash when set mac address of br interfaceHong zhi guo
When I tried to set mac address of a bridge interface to a mac address which already learned on this bridge, I got system hang. The cause is straight forward: function br_fdb_change_mac_address calls fdb_insert with NULL source nbp. Then an fdb lookup is performed. If an fdb entry is found and it's local, it's OK. But if it's not local, source is dereferenced for printk without NULL check. Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-248021q: fix a potential use-after-freeCong Wang
vlan_vid_del() could possibly free ->vlan_info after a RCU grace period, however, we may still refer to the freed memory area by 'grp' pointer. Found by code inspection. This patch moves vlan_vid_del() as behind as possible. Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24net: remove a WARN_ON() in net_enable_timestamp()Eric Dumazet
The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false positive, in socket clone path, run from softirq context : [ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80() [ 3641.668811] Call Trace: [ 3641.671254] <IRQ> [<ffffffff80286817>] warn_slowpath_common+0x87/0xc0 [ 3641.677871] [<ffffffff8028686a>] warn_slowpath_null+0x1a/0x20 [ 3641.683683] [<ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80 [ 3641.689668] [<ffffffff80732ce5>] sk_clone_lock+0x425/0x450 [ 3641.695222] [<ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170 [ 3641.701213] [<ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820 [ 3641.707663] [<ffffffff807d62e2>] ? ipt_do_table+0x222/0x670 [ 3641.713354] [<ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0 [ 3641.719425] [<ffffffff807af63a>] tcp_check_req+0x3da/0x530 [ 3641.724979] [<ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80 [ 3641.730964] [<ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0 [ 3641.736430] [<ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0 [ 3641.741985] [<ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0 Its safe at this point because the parent socket owns a reference on the netstamp_needed, so we cant have a 0 -> 1 transition, which requires to lock a mutex. Instead of refining the check, lets remove it, as all known callers are safe. If it ever changes in the future, static_key_slow_inc() will complain anyway. Reported-by: Laurent Chavey <chavey@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24mac80211: Don't restart sta-timer if not associated.Ben Greear
I found another crash when deleting lots of virtual stations in a congested environment. I think the problem is that the ieee80211_mlme_notify_scan_completed could call ieee80211_restart_sta_timer for a stopped interface that was about to be deleted. With the following patch I am unable to reproduce the crash. Signed-off-by: Ben Greear <greearb@candelatech.com> [move check, also make the same change in mesh] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-24cfg80211: always check for scan end on P2P deviceJohannes Berg
If a P2P device wdev is removed while it has a scan, then the scan completion might crash later as it is already freed by that time. To avoid the crash always check the scan completion when the P2P device is being removed for some reason. If the driver already canceled it, don't want and free it, otherwise warn and leak it to avoid later crashes. In order to do this, locking needs to be changed away from the rdev mutex (which can't always be guaranteed). For now, use the sched_scan_mtx instead, I'll rename it to just scan_mtx in a later patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-22tcp: preserve ACK clocking in TSOEric Dumazet
A long standing problem with TSO is the fact that tcp_tso_should_defer() rearms the deferred timer, while it should not. Current code leads to following bad bursty behavior : 20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119 20:11:24.484337 IP B > A: . ack 263721 win 1117 20:11:24.485086 IP B > A: . ack 265241 win 1117 20:11:24.485925 IP B > A: . ack 266761 win 1117 20:11:24.486759 IP B > A: . ack 268281 win 1117 20:11:24.487594 IP B > A: . ack 269801 win 1117 20:11:24.488430 IP B > A: . ack 271321 win 1117 20:11:24.489267 IP B > A: . ack 272841 win 1117 20:11:24.490104 IP B > A: . ack 274361 win 1117 20:11:24.490939 IP B > A: . ack 275881 win 1117 20:11:24.491775 IP B > A: . ack 277401 win 1117 20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119 20:11:24.492620 IP B > A: . ack 278921 win 1117 20:11:24.493448 IP B > A: . ack 280441 win 1117 20:11:24.494286 IP B > A: . ack 281961 win 1117 20:11:24.495122 IP B > A: . ack 283481 win 1117 20:11:24.495958 IP B > A: . ack 285001 win 1117 20:11:24.496791 IP B > A: . ack 286521 win 1117 20:11:24.497628 IP B > A: . ack 288041 win 1117 20:11:24.498459 IP B > A: . ack 289561 win 1117 20:11:24.499296 IP B > A: . ack 291081 win 1117 20:11:24.500133 IP B > A: . ack 292601 win 1117 20:11:24.500970 IP B > A: . ack 294121 win 1117 20:11:24.501388 IP B > A: . ack 295641 win 1117 20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119 While the expected behavior is more like : 20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119 20:19:49.260446 IP B > A: . ack 154281 win 1212 20:19:49.261282 IP B > A: . ack 155801 win 1212 20:19:49.262125 IP B > A: . ack 157321 win 1212 20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119 20:19:49.262958 IP B > A: . ack 158841 win 1212 20:19:49.263795 IP B > A: . ack 160361 win 1212 20:19:49.264628 IP B > A: . ack 161881 win 1212 20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119 20:19:49.265465 IP B > A: . ack 163401 win 1212 20:19:49.265886 IP B > A: . ack 164921 win 1212 20:19:49.266722 IP B > A: . ack 166441 win 1212 20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119 20:19:49.267559 IP B > A: . ack 167961 win 1212 20:19:49.268394 IP B > A: . ack 169481 win 1212 20:19:49.269232 IP B > A: . ack 171001 win 1212 20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20mac80211: fix virtual monitor interface lockingJohannes Berg
The virtual monitor interface has a locking issue, it calls into the channel context code with the iflist mutex held which isn't allowed since it is usually acquired the other way around. The mutex is still required for the interface iteration, but need not be held across the channel calls. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-20cfg80211: fix wdev tracing crashJohannes Berg
Arend reported a crash in tracing if the driver returns an ERR_PTR() value from the add_virtual_intf() callback. This is due to the tracing then still attempting to dereference the "pointer", fix this by using IS_ERR_OR_NULL(). Reported-by: Arend van Spriel <arend@broadcom.com> Tested-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-20Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-03-20net/irda: add missing error path release_sock callKees Cook
This makes sure that release_sock is called for all error conditions in irda_getsockopt. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Brad Spengler <spender@grsecurity.net> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20ipconfig: Fix newline handling in log message.Martin Fuzzey
When using ipconfig the logs currently look like: Single name server: [ 3.467270] IP-Config: Complete: [ 3.470613] device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1 [ 3.480670] host=infigo-1, domain=, nis-domain=(none) [ 3.486166] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath= [ 3.492910] nameserver0=172.16.42.1[ 3.496853] ALSA device list: Three name servers: [ 3.496949] IP-Config: Complete: [ 3.500293] device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1 [ 3.510367] host=infigo-1, domain=, nis-domain=(none) [ 3.515864] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath= [ 3.522635] nameserver0=172.16.42.1, nameserver1=172.16.42.100 [ 3.529149] , nameserver2=172.16.42.200 Fix newline handling for these cases Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20flow_keys: include thoff into flow_keys for later usageDaniel Borkmann
In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're doing the work here anyway, also store thoff for a later usage, e.g. in the BPF filter. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: unhash l2tp sessions on delete, not on freeTom Parkin
If we postpone unhashing of l2tp sessions until the structure is freed, we risk: 1. further packets arriving and getting queued while the pseudowire is being closed down 2. the recv path hitting "scheduling while atomic" errors in the case that recv drops the last reference to a session and calls l2tp_session_free while in atomic context As such, l2tp sessions should be unhashed from l2tp_core data structures early in the teardown process prior to calling pseudowire close. For pseudowires like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: avoid deadlock in l2tp stats updateTom Parkin
l2tp's u64_stats writers were incorrectly synchronised, making it possible to deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp code netlink commands while passing data through l2tp sessions. Previous discussion on netdev determined that alternative solutions such as spinlock writer synchronisation or per-cpu data would bring unjustified overhead, given that most users interested in high volume traffic will likely be running 64bit kernels on 64bit hardware. As such, this patch replaces l2tp's use of u64_stats with atomic_long_t, thereby avoiding the deadlock. Ref: http://marc.info/?l=linux-netdev&m=134029167910731&w=2 http://marc.info/?l=linux-netdev&m=134079868111131&w=2 Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: push all ppp pseudowire shutdown through .release handlerTom Parkin
If userspace deletes a ppp pseudowire using the netlink API, either by directly deleting the session or by deleting the tunnel that contains the session, we need to tear down the corresponding pppox channel. Rather than trying to manage two pppox unbind codepaths, switch the netlink and l2tp_core session_close handlers to close via. the l2tp_ppp socket .release handler. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: purge session reorder queue on deleteTom Parkin
Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall and l2tp_session_delete. Pseudowire implementations which are deleted only via. l2tp_core l2tp_session_delete calls can dispense with their own code for flushing the reorder queue. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: add session reorder queue purge function to coreTom Parkin
If an l2tp session is deleted, it is necessary to delete skbs in-flight on the session's reorder queue before taking it down. Rather than having each pseudowire implementation reaching into the l2tp_session struct to handle this itself, provide a function in l2tp_core to purge the session queue. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: don't BUG_ON sk_socket being NULLTom Parkin
It is valid for an existing struct sock object to have a NULL sk_socket pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: take a reference for kernel sockets in l2tp_tunnel_sock_lookupTom Parkin
When looking up the tunnel socket in struct l2tp_tunnel, hold a reference whether the socket was created by the kernel or by userspace. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: close sessions before initiating tunnel deleteTom Parkin
When a user deletes a tunnel using netlink, all the sessions in the tunnel should also be deleted. Since running sessions will pin the tunnel socket with the references they hold, have the l2tp_tunnel_delete close all sessions in a tunnel before finally closing the tunnel socket. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: close sessions in ip socket destroy callbackTom Parkin
l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel socket being closed from userspace. We need to do the same thing for IP-encapsulation sockets. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: export l2tp_tunnel_closeallTom Parkin
l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a tunnel when a UDP-encapsulation socket is destroyed. We need to do something similar for IP-encapsulation sockets. Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to call it from their .destroy handlers. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20l2tp: add udp encap socket destroy handlerTom Parkin
L2TP sessions hold a reference to the tunnel socket to prevent it going away while sessions are still active. However, since tunnel destruction is handled by the sock sk_destruct callback there is a catch-22: a tunnel with sessions cannot be deleted since each session holds a reference to the tunnel socket. If userspace closes a managed tunnel socket, or dies, the tunnel will persist and it will be neccessary to individually delete the sessions using netlink commands. This is ugly. To prevent this occuring, this patch leverages the udp encapsulation socket destroy callback to gain early notification when the tunnel socket is closed. This allows us to safely close the sessions running in the tunnel, dropping the tunnel socket references in the process. The tunnel socket is then destroyed as normal, and the tunnel resources deallocated in sk_destruct. While we're at it, ensure that l2tp_tunnel_closeall correctly drops session references to allow the sessions to be deleted rather than leaking. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20udp: add encap_destroy callbackTom Parkin
Users of udp encapsulation currently have an encap_rcv callback which they can use to hook into the udp receive path. In situations where a encapsulation user allocates resources associated with a udp encap socket, it may be convenient to be able to also hook the proto .destroy operation. For example, if an encap user holds a reference to the udp socket, the destroy hook might be used to relinquish this reference. This patch adds a socket destroy hook into udp, which is set and enabled in the same way as the existing encap_rcv hook. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20genetlink: trigger BUG_ON if a group name is too longMasatake YAMATO
Trigger BUG_ON if a group name is longer than GENL_NAMSIZ. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20Merge branch 'master' of git://1984.lsi.us.es/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are: * Restrict IPv6 stateless NPT targets to the mangle table. Many users are complaining that this target does not work in the nat table, which is the wrong table for it, from Florian Westphal. * Fix possible use before initialization in the netns init path of several conntrack protocol trackers (introduced recently while improving conntrack netns support), from Gao Feng. * Fix incorrect initialization of copy_range in nfnetlink_queue, spotted by Eric Dumazet during the NFWS2013, patch from myself. * Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov. * Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu not required anymore after change introduced in 3.7, again from Julian. * Fix SYN looping in IPVS state sync if the backup is used a real server in DR/TUN modes, this required a new /proc entry to disable the director function when acting as backup, also from Julian. * Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by Paul Bolle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>