aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/bonding/bond_main.c
AgeCommit message (Collapse)Author
2011-10-30bonding: eliminate bond_close race conditionsJay Vosburgh
This patch resolves two sets of race conditions. Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> reported the first, as follows: The bond_close() calls cancel_delayed_work() to cancel delayed works. It, however, cannot cancel works that were already queued in workqueue. The bond_open() initializes work->data, and proccess_one_work() refers get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when work->data has been initialized. Thus, a panic occurs. He included a patch that converted the cancel_delayed_work calls in bond_close to flush_delayed_work_sync, which eliminated the above problem. His patch is incorporated, at least in principle, into this patch. In this patch, we use cancel_delayed_work_sync in place of flush_delayed_work_sync, and also convert bond_uninit in addition to bond_close. This conversion to _sync, however, opens new races between bond_close and three periodically executing workqueue functions: bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon. The race occurs because bond_close and bond_uninit are always called with RTNL held, and these workqueue functions may acquire RTNL to perform failover-related activities. If bond_close or bond_uninit is waiting in cancel_delayed_work_sync, deadlock occurs. These deadlocks are resolved by having the workqueue functions acquire RTNL conditionally. If the rtnl_trylock() fails, the functions reschedule and return immediately. For the cases that are attempting to perform link failover, a delay of 1 is used; for the other cases, the normal interval is used (as those activities are not as time critical). Additionally, the bond_mii_monitor function now stores the delay in a variable (mimicing the structure of activebackup_arp_mon). Lastly, all of the above renders the kill_timers sentinel moot, and therefore it has been removed. Tested-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-25net: make bonding slaves honour master's skb->priorityMaciej Żenczykowski
Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-10-19bonding: Use a per netns implementation of /sys/class/net/bonding_masters.Eric W. Biederman
This fixes a network namespace misfeature that bonding_masters looked at current instead of the remembering the context where in which /sys/class/net/bonding_masters was opened in to see which network namespace to act upon. This removes the need for sysfs to handle tagged directories with untagged members allowing for a conceptually simpler sysfs implementation. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19bonding: use local function pointer of bond->recv_probe in bond_handle_frameMitsuo Hayasaka
The bond->recv_probe is called in bond_handle_frame() when a packet is received, but bond_close() sets it to NULL. So, a panic occurs when both functions work in parallel. Why this happen: After null pointer check of bond->recv_probe, an sk_buff is duplicated and bond->recv_probe is called in bond_handle_frame. So, a panic occurs when bond_close() is called between the check and call of bond->recv_probe. Patch: This patch uses a local function pointer of bond->recv_probe in bond_handle_frame(). So, it can avoid the null pointer dereference. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-07Merge branch 'master' of github.com:davem330/netDavid S. Miller
Conflicts: net/batman-adv/soft-interface.c
2011-10-03bonding: properly stop queuing work when requestedAndy Gospodarek
During a test where a pair of bonding interfaces using ARP monitoring were both brought up and torn down (with an rmmod) repeatedly, a panic in the timer code was noticed. I tracked this down and determined that any of the bonding functions that ran as workqueue handlers and requeued more work might not properly exit when the module was removed. There was a flag protected by the bond lock called kill_timers that is set when the interface goes down or the module is removed, but many of the functions that monitor link status now unlock the bond lock to take rtnl first. There is a chance that another CPU running the rmmod could get the lock and set kill_timers after the first check has passed. This patch does not allow any function to queue work that will make itself run unless kill_timers is not set. I also noticed while doing this work that bond_resend_igmp_join_requests did not have a check for kill_timers, so I added the needed call there as well. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Reported-by: Liang Zheng <lzheng@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15net: consolidate and fix ethtool_ops->get_settings callingJiri Pirko
This patch does several things: - introduces __ethtool_get_settings which is called from ethtool code and from drivers as well. Put ASSERT_RTNL there. - dev_ethtool_get_settings() is replaced by __ethtool_get_settings() - changes calling in drivers so rtnl locking is respected. In iboe_get_rate was previously ->get_settings() called unlocked. This fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same problem. Also fixed by calling __dev_get_by_index() instead of dev_get_by_index() and holding rtnl_lock for both calls. - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create() so bnx2fc_if_create() and fcoe_if_create() are called locked as they are from other places. - use __ethtool_get_settings() in bonding code Signed-off-by: Jiri Pirko <jpirko@redhat.com> v2->v3: -removed dev_ethtool_get_settings() -added ASSERT_RTNL into __ethtool_get_settings() -prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock around it and __ethtool_get_settings() call v1->v2: add missing export_symbol Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits] Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-08-17net: remove use of ndo_set_multicast_list in driversJiri Pirko
replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17bonding: use ndo_change_rx_flags callbackJiri Pirko
Benefit from use of ndo_change_rx_flags in handling change of promisc and allmulti. No need to store previous state locally. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17bonding:reset backup and inactive flag of slavePeter Pan(潘卫平)
Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change bonding mode from active backup to round robin, some slaves are still keeping "backup", and won't transmit packets. As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around that by removing the bond_is_active_slave() check, because the "backup" flag is only meaningful for active backup mode. But if we just simply ignore the bond_is_active_slave() check, the transmission will work fine, but we can't maintain the correct value of "backup" flag for each slaves, though it is meaningless for other mode than active backup. I'd like to reset "backup" and "inactive" flag in bond_open, thus we can keep the correct value of them. As for bond_is_active_slave(), I'd like to prepare another patch to handle it. V2: Use C style comment. Move read_lock(&bond->curr_slave_lock). Replace restore with reset, for active backup mode, it means "restore", but for other modes, it means "reset". Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-11bonding: implement get_tx_queues rtnk_link_opJiri Pirko
If bonding device is created via rtnl, it is created with default number of rx/tx queues. This patch implements callback in bonding so the correct value (previously specified by bonding module param) is used. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-27bonding: reduce noise during initAndy Gospodarek
On Tue, Jul 26, 2011 at 05:40:27PM -0700, Joe Perches wrote: > On Tue, 2011-07-26 at 17:37 -0700, Jay Vosburgh wrote: > > Joe Perches <joe@perches.com> wrote: > > >I'd prefer you don't separate the format string > > >into multiple pieces. > > Why not? To me, it looks easier to read split into sections > > that don't wrap lines. > > Harder to grep for a dmesg and the > defect rate of these split formats is > typically higher than single strings > because of bad spacing between string > segments. > I noticed that you took some time back in late 2009 to 'consolidate' the split format-strings present in the bonding driver at the time and I've decided I'm fine to leave them the way they are. The main point of my patch was to change the output and I would like to get that included. Here is my updated patch... Subject: [PATCH net-next-2.6 v2] bonding: reduce noise during init Many are using sysfs to configure bonding rather than module options, so there is no need for bonding to throw this warning in normal cases. Keep the message around when debugging is enabled as it might be useful for someone desperate enough to enable debugging, but eliminate it otherwise. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-27net: Audit drivers to identify those needing IFF_TX_SKB_SHARING clearedNeil Horman
After the last patch, We are left in a state in which only drivers calling ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real hardware call ether_setup for their net_devices and don't hold any state in their skbs. There are a handful of drivers that violate this assumption of course, and need to be fixed up. This patch identifies those drivers, and marks them as not being able to support the safe transmission of skbs by clearning the IFF_TX_SKB_SHARING flag in priv_flags Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Karsten Keil <isdn@linux-pingi.de> CC: "David S. Miller" <davem@davemloft.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Patrick McHardy <kaber@trash.net> CC: Krzysztof Halasa <khc@pm.waw.pl> CC: "John W. Linville" <linville@tuxdriver.com> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Marcel Holtmann <marcel@holtmann.org> CC: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21bonding: do vlan cleanupJiri Pirko
Now when all devices are cleaned up, bond can be cleaned up as well - remove bond->vlgrp - remove bond_vlan_rx_register - substitute necessary occurences of vlan_group_get_device Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c
2011-07-14net: remove NETIF_F_ALL_TX_OFFLOADSMichał Mirosław
There is no software fallback implemented for SCTP or FCoE checksumming, and so it should not be passed on by software devices like bridge or bonding. For VLAN devices, this is different. First, the driver for underlying device should be prepared to get offloaded packets even when the feature is disabled (especially if it advertises it in vlan_features). Second, devices under VLANs do not get replaced without tearing down the VLAN first. This fixes a mess I accidentally introduced while converting bonding to ndo_fix_features. NETIF_F_SOFT_FEATURES are removed from BOND_VLAN_FEATURES because they are unused as of commit 712ae51afd. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-23bonding: add min links parameter to 802.3adstephen hemminger
This adds support for a configuring the minimum number of links that must be active before asserting carrier. It is similar to the Cisco EtherChannel min-links feature. This allows setting the minimum number of member ports that must be up (link-up state) before marking the bond device as up (carrier on). This is useful for situations where higher level services such as clustering want to ensure a minimum number of low bandwidth links are active before switchover. See: http://bugzilla.vyatta.com/show_bug.cgi?id=7196 Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-21ip: introduce ip_is_fragment helper inline functionPaul Gortmaker
There are enough instances of this: iph->frag_off & htons(IP_MF | IP_OFFSET) that a helper function is probably warranted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-20Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c
2011-06-19netpoll: copy dev name of slaves to struct netpollWANG Cong
Otherwise we will not see the name of the slave dev in error message: [ 388.469446] (null): doesn't support polling, aborting. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-11bonding: clean up bond_del_vlan()Jiri Bohac
1) the setting of NETIF_F_VLAN_CHALLENGED in bond_del_vlan() is useless since commit b2a103e6 because bond_fix_features() now sets NETIF_F_VLAN_CHALLENGED whenever the last slave is being removed. 2) the code never triggers anyway as vlan_list is never empty since ad1afb00. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-09bonding:delete lacp_fast from ad_bond_infoPeter Pan(潘卫平)
These is also a bug, that if you modify lacp_rate via sysfs, and add new slaves in bonding, new slaves won't use the latest lacp_rate, since ad_bond_info->lacp_fast is initialized only once, in bond_3ad_initialize(). Since both struct bond_params and ad_bond_info have lacp_fast, they are duplicate and need extra synchronization. bond_3ad_bind_slave() can use bond_params->lacp_fast to initialize port. So we can just remove lacp_fast from struct ad_bond_info. Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-05bonding: reset queue mapping prior to transmission to physical device (v5)Neil Horman
The bonding driver is multiqueue enabled, in which each queue represents a slave to enable optional steering of output frames to given slaves against the default output policy. However, it needs to reset the skb->queue_mapping prior to queuing to the physical device or the physical slave (if it is multiqueue) could wind up transmitting on an unintended tx queue Change Notes: v2) Based on first pass review, updated the patch to restore the origional queue mapping that was found in bond_select_queue, rather than simply resetting to zero. This preserves the value of queue_mapping when it was set on receive in the forwarding case which is desireable. v3) Fixed spelling an casting error in skb->cb v4) fixed to store raw queue_mapping to avoid double decrement v5) Eric D requested that ->cb access be wrapped in a macro. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-02bonding: allow all slave speedsJiri Pirko
No need to check for 10, 100, 1000, 10000 explicitly. Just make this generic and check for invalid values only (similar check is in ethtool userspace app). This enables correct speed handling for slave devices with "nonstandard" speeds. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-26bonding: cleanup module option descriptionsAndy Gospodarek
Weiping Pan noticed that the module option description for xmit_hash_policy was incorrect and was nice enough to post a patch to fix it. The text was correct, but created a line over 80 characters and I would rather not add those. I realized I could take a few minutes and clean up all the descriptions and things would look much better. This is the result. Based on patch from Weiping Pan <panweiping3@gmail.com>. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Weiping Pan <panweiping3@gmail.com> Reviewed-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-25bonding: documentation and code cleanup for resend_igmpFlavio Leitner
Improves the documentation about how IGMP resend parameter works, fix two missing checks and coding style issues. Signed-off-by: Flavio Leitner <fbl@redhat.com> Acked-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-25bonding: prevent deadlock on slave store with alb mode (v3)Neil Horman
This soft lockup was recently reported: [root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters [root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves bonding: bond5: doing slave updates when interface is down. bonding bond5: master_dev is not up in bond_enslave [root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves bonding: bond5: doing slave updates when interface is down. BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444] CPU 12: Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2d Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1 RIP: 0010:[<ffffffff80064bf0>] [<ffffffff80064bf0>] .text.lock.spinlock+0x26/00 RSP: 0018:ffff810113167da8 EFLAGS: 00000286 RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025 RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8 RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000 R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282 FS: 00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0 Call Trace: [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14 [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1 [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0 [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450 [<ffffffff8006457b>] __down_write_nested+0x12/0x92 [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7 [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8 [<ffffffff80016b87>] vfs_write+0xce/0x174 [<ffffffff80017450>] sys_write+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0 It occurs because we are able to change the slave configuarion of a bond while the bond interface is down. The bonding driver initializes some data structures only after its ndo_open routine is called. Among them is the initalization of the alb tx and rx hash locks. So if we add or remove a slave without first opening the bond master device, we run the risk of trying to lock/unlock a spinlock that has garbage for data in it, which results in our above softlock. Note that sometimes this works, because in many cases an unlocked spinlock has the raw_lock parameter initialized to zero (meaning that the kzalloc of the net_device private data is equivalent to calling spin_lock_init), but thats not true in all cases, and we aren't guaranteed that condition, so we need to pass the relevant spinlocks through the spin_lock_init function. Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to the ndo_init path, so they are ready for use by the bond_store_slaves path. Change notes: v2) Based on conversation with Jay and Nicolas it seems that the ability to enslave devices while the bond master is down should be safe to do. As such this is an outlier bug, and so instead we'll just initalize the errant spinlocks in the init path rather than the open path, solving the problem. We'll also remove the warnings about the bond being down during enslave operations, since it should be safe v3) Fix spelling error Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: jtluka@redhat.com CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: nicolas.2p.debian@gmail.com CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-22net: rename NETDEV_BONDING_DESLAVE to NETDEV_RELEASEAmerigo Wang
s/NETDEV_BONDING_DESLAVE/NETDEV_RELEASE/ as Andy suggested. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-22netpoll: disable netpoll when enslave a deviceAmerigo Wang
V3: rename NETDEV_ENSLAVE to NETDEV_JOIN Currently we do nothing when we enslave a net device which is running netconsole. Neil pointed out that we may get weird results in such case, so let's disable netpoll on the device being enslaved. I think it is too harsh to prevent the device being ensalved if it is running netconsole. By the way, this patch also removes the NETDEV_GOING_DOWN from netconsole netdev notifier, because netpoll will check if the device is running or not and we don't handle NETDEV_PRE_UP neither. This patch is based on net-next-2.6. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12bonding: convert to ndo_fix_featuresMichał Mirosław
This should also fix updating of vlan_features and propagating changes to VLAN devices on the bond. Side effect: it allows user to force-disable some offloads on the bond interface. Note: NETIF_F_VLAN_CHALLENGED is managed by bond_fix_features() now. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-09net: bonding: factor out rlock(bond->lock) in xmit pathMichał Mirosław
Pull read_lock(&bond->lock) and BOND_IS_OK() to bond_start_xmit() from mode-dependent xmit functions. netif_running() is always true in hard_start_xmit. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-05net: call dev_alloc_name from register_netdeviceJiri Pirko
Force dev_alloc_name() to be called from register_netdevice() by dev_get_valid_name(). That allows to remove multiple explicit dev_alloc_name() calls. The possibility to call dev_alloc_name in advance remains. This also fixes veth creation regresion caused by 84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-29ipv4, ipv6, bonding: Restore control over number of peer notificationsBen Hutchings
For backward compatibility, we should retain the module parameters and sysfs attributes to control the number of peer notifications (gratuitous ARPs and unsolicited NAs) sent after bonding failover. Also, it is possible for failover to take place even though the new active slave does not have link up, and in that case the peer notification should be deferred until it does. Change ipv4 and ipv6 so they do not automatically send peer notifications on bonding failover. Change the bonding driver to send separate NETDEV_NOTIFY_PEERS notifications when the link is up, as many times as requested. Since it does not directly control which protocols send notifications, make num_grat_arp and num_unsol_na aliases for a single parameter. Bump the bonding version number and update its documentation. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Acked-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-25bonding: move processing of recv handlers into handle_frame()Jiri Pirko
Since now when bonding uses rx_handler, all traffic going into bond device goes thru bond_handle_frame. So there's no need to go back into bonding code later via ptype handlers. This patch converts original ptype handlers into "bonding receive probes". These functions are called from bond_handle_frame and they are registered per-mode. Note that vlan packets are also handled because they are always untagged thanks to vlan_untag() Note that this also allows arpmon for eth-bond-bridge-vlan topology. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-17bonding, ipv4, ipv6, vlan: Handle NETDEV_BONDING_FAILOVER like ↵Ben Hutchings
NETDEV_NOTIFY_PEERS It is undesirable for the bonding driver to be poking into higher level protocols, and notifiers provide a way to avoid that. This does mean removing the ability to configure reptitition of gratuitous ARPs and unsolicited NAs. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-17bonding: Fix set-but-unused variable.David S. Miller
The variable 'vlan_dev' is set but unused in bond_send_gratuitous_arp(). Just kill it off. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-14net-bonding: Adding support for throughputs larger than 65536 MbpsDavid Decotigny
This updates the bonding driver to support v2.6.27-rc3 enhancements (b11f8d8c aka. "ethtool: Expand ethtool_cmd.speed to 32 bits") which allow to encode the Mbps link speed on 32-bits (Max 4 Pbps) instead of 16 (Max 65536 Mbps). This patch also attempts to compact struct slave by reordering its fields. Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-14net-bonding: Fix minor/cosmetic type inconsistenciesDavid Decotigny
The __get_link_speed() function returns a u16 value which was stored in a u32 local variable. This patch uses the return value directly, thus fixing that minor type consistency. The 'duplex' field in struct slave being encoded on 8 bits, to be more consistent we use a u8 integer (instead of u16) whenever we copy it to local variables. Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-14net-bonding: Fix minor sparse complaintsDavid Decotigny
This gets rid of minor sparse complaints: drivers/net/bonding/bond_main.c:4361:4: warning: do-while statement is not a compound statement drivers/net/bonding/bond_main.c:243:12: warning: symbol 'bond_mode_name' was not declared. Should it be static? Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04net: Allow no-cache copy from user on transmitTom Herbert
This patch uses __copy_from_user_nocache on transmit to bypass data cache for a performance improvement. skb_add_data_nocache and skb_copy_to_page_nocache can be called by sendmsg functions to use this feature, initial support is in tcp_sendmsg. This functionality is configurable per device using ethtool. Presumably, this feature would only be useful when the driver does not touch the data. The feature is turned on by default if a device indicates that it does some form of checksum offload; it is off by default for devices that do no checksum offload or indicate no checksum is necessary. For the former case copy-checksum is probably done anyway, in the latter case the device is likely loopback in which case the no cache copy is probably not beneficial. This patch was tested using 200 instances of netperf TCP_RR with 1400 byte request and one byte reply. Platform is 16 core AMD x86. No-cache copy disabled: 672703 tps, 97.13% utilization 50/90/99% latency:244.31 484.205 1028.41 No-cache copy enabled: 702113 tps, 96.16% utilization, 50/90/99% latency 238.56 467.56 956.955 Using 14000 byte request and response sizes demonstrate the effects more dramatically: No-cache copy disabled: 79571 tps, 34.34 %utlization 50/90/95% latency 1584.46 2319.59 5001.76 No-cache copy enabled: 83856 tps, 34.81% utilization 50/90/95% latency 2508.42 2622.62 2735.88 Note especially the effect on latency tail (95th percentile). This seems to provide a nice performance improvement and is consistent in the tests I ran. Presumably, this would provide the greatest benfits in the presence of an application workload stressing the cache and a lot of transmit data happening. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23bonding: fix rx_handler lockingJiri Pirko
This prevents possible race between bond_enslave and bond_handle_frame as reported by Nicolas by moving rx_handler register/unregister. slave->bond is added to hold pointer to master bonding sructure. That way dev->master is no longer used in bond_handler_frame. Also, this removes "BUG: scheduling while atomic" message Reported-by: Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Tested-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-19bonding: fix a typo in a commentNicolas de Pesloüan
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-16bonding: enable netpoll without checking link statusAndy Gospodarek
Only slaves that are up should transmit netpoll frames, so there is no need to check to see if a slave is up before enabling netpoll on it. This resolves a reported failure on active-backup bonds where a slave interface is down when netpoll was enabled. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Tested-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-16net: introduce rx_handler results and logic around thatJiri Pirko
This patch allows rx_handlers to better signalize what to do next to it's caller. That makes skb->deliver_no_wcard no longer needed. kernel-doc for rx_handler_result is taken from Nicolas' patch. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-16bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flagJiri Pirko
Since bond-related code was moved from net/core/dev.c into bonding, IFF_SLAVE_INACTIVE is no longer needed. Replace is with flag "inactive" stored in slave structure Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-16bonding: wrap slave state workJiri Pirko
transfers slave->state into slave->backup (that it's going to transfer into bitfield. Introduce wrapper inlines to do the work with it. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-16net: get rid of multiple bond-related netdevice->priv_flagsJiri Pirko
Now when bond-related code is moved from net/core/dev.c into bonding code, multiple priv_flags are not needed anymore. So let them rot. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-16bonding: register slave pointer for rx_handlerJiri Pirko
Register slave pointer as rx_handler data. That would eventually prevent need to loop over slave devices to find the right slave. Use synchronize_net to ensure that bond_handle_frame does not get slave structure freed when working with that. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>