aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/loopback.c
AgeCommit message (Collapse)Author
2022-05-16net: loopback: enable BIG TCP packetsEric Dumazet
Set the driver limit to GSO_MAX_SIZE (512 KB). This allows the admin/user to set a GSO limit up to this value. Tested: ip link set dev lo gso_max_size 200000 netperf -H ::1 -t TCP_RR -l 100 -- -r 80000,80000 & tcpdump shows : 18:28:42.962116 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626480001:3626560001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000 18:28:42.962138 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0 18:28:42.962152 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626560001:3626640001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000 18:28:42.962157 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0 18:28:42.962180 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626560001:3626640001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000 18:28:42.962214 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 0 18:28:42.962228 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626640001:3626720001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 80000 18:28:42.962233 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626720001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179266], length 0 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: Add skb_clear_tstamp() to keep the mono delivery_timeMartin KaFai Lau
Right now, skb->tstamp is reset to 0 whenever the skb is forwarded. If skb->tstamp has the mono delivery_time, clearing it can hurt the performance when it finally transmits out to fq@phy-dev. The earlier patch added a skb->mono_delivery_time bit to flag the skb->tstamp carrying the mono delivery_time. This patch adds skb_clear_tstamp() helper which keeps the mono delivery_time and clears everything else. The delivery_time clearing will be postponed until the stack knows the skb will be delivered locally. It will be done in a latter patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14net: dev: Makes sure netif_rx() can be invoked in any context.Sebastian Andrzej Siewior
Dave suggested a while ago (eleven years by now) "Let's make netif_rx() work in all contexts and get rid of netif_rx_ni()". Eric agreed and pointed out that modern devices should use netif_receive_skb() to avoid the overhead. In the meantime someone added another variant, netif_rx_any_context(), which behaves as suggested. netif_rx() must be invoked with disabled bottom halves to ensure that pending softirqs, which were raised within the function, are handled. netif_rx_ni() can be invoked only from process context (bottom halves must be enabled) because the function handles pending softirqs without checking if bottom halves were disabled or not. netif_rx_any_context() invokes on the former functions by checking in_interrupts(). netif_rx() could be taught to handle both cases (disabled and enabled bottom halves) by simply disabling bottom halves while invoking netif_rx_internal(). The local_bh_enable() invocation will then invoke pending softirqs only if the BH-disable counter drops to zero. Eric is concerned about the overhead of BH-disable+enable especially in regard to the loopback driver. As critical as this driver is, it will receive a shortcut to avoid the additional overhead which is not needed. Add a local_bh_disable() section in netif_rx() to ensure softirqs are handled if needed. Provide __netif_rx() which does not disable BH and has a lockdep assert to ensure that interrupts are disabled. Use this shortcut in the loopback driver and in drivers/net/*.c. Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they can be removed once they are no more users left. Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-29net: Don't include filter.h from net/sock.hJakub Kicinski
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2019-11-07net: use u64_stats_t in struct pcpu_lstatsEric Dumazet
In order to fix the data-race found by KCSAN, we can use the new u64_stats_t type and its accessors instead of plain u64 fields. This will still generate optimal code for both 32 and 64 bit platforms. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: provide dev_lstats_add() helperEric Dumazet
Many network drivers need it and hand-coded the same function. In order to ease u64_stats_t adoption, it is time to factorize. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: provide dev_lstats_read() helperEric Dumazet
Many network drivers use hand-coded implementation of the same thing, let's factorize things so that u64_stats_t adoption is done once. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03loopback: fix lockdep splatMahesh Bandewar
dev_init_scheduler() and dev_activate() expect the caller to hold RTNL. Since we don't want blackhole device to be initialized per ns, we are initializing at init. [ 3.855027] Call Trace: [ 3.855034] dump_stack+0x67/0x95 [ 3.855037] lockdep_rcu_suspicious+0xd5/0x110 [ 3.855044] dev_init_scheduler+0xe3/0x120 [ 3.855048] ? net_olddevs_init+0x60/0x60 [ 3.855050] blackhole_netdev_init+0x45/0x6e [ 3.855052] do_one_initcall+0x6c/0x2fa [ 3.855058] ? rcu_read_lock_sched_held+0x8c/0xa0 [ 3.855066] kernel_init_freeable+0x1e5/0x288 [ 3.855071] ? rest_init+0x260/0x260 [ 3.855074] kernel_init+0xf/0x180 [ 3.855076] ? rest_init+0x260/0x260 [ 3.855078] ret_from_fork+0x24/0x30 Fixes: 4de83b88c66 ("loopback: create blackhole net device similar to loopack.") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01loopback: create blackhole net device similar to loopack.Mahesh Bandewar
Create a blackhole net device that can be used for "dead" dst entries instead of loopback device. This blackhole device differs from loopback in few aspects: (a) It's not per-ns. (b) MTU on this device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb(). and (d) since it's not registered it won't have ifindex. Lower MTU effectively make the device not pass the MTU check during the route check when a dst associated with the skb is dead. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-12net: loopback: use generic helper to report timestamping infoJulian Wiedmann
For reporting the common set of SW timestamping capabilities, use ethtool_op_get_ts_info() instead of re-implementing it. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-20net: loopback: clear skb->tstamp before netif_rx()Eric Dumazet
At least UDP / TCP stacks can now cook skbs with a tstamp using MONOTONIC base (or arbitrary values with SCM_TXTIME) Since loopback driver does not call (directly or indirectly) skb_scrub_packet(), we need to clear skb->tstamp so that net_timestamp_check() can eventually resample the time, using ktime_get_real(). Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-14net: move definition of pcpu_lstats to header fileLi RongQing
pcpu_lstats is defined in several files, so unify them as one and move to header file Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07net: Fix inconsistent teardown and release of private netdev state.David S. Miller
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21Cleanup some warning from timestamping code.Ezequiel Lara Gomez
Following checkpatch.pl recommendations (which include replacing with <linux/io.h> the <asm/io.h>, since linux/io.h includes it). Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21Enable tx timestamping on loopback and dummyEzequiel Lara Gomez
This enables developing code that uses SOF_TIMESTAMPING_TX_SOFTWARE by using localhost addresses (without needing to send packets outside), as well as enabling unit and functional testing of TX timestamping code without needing hardware support or network access. It also fulfills the expectation of software network devices supporting software-based timestamping. Tested on qemu using txtimestamping.c from the kernel selftests, and ethtool -T. Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-02-08net: introduce device min_header_lenWillem de Bruijn
The stack must not pass packets to device drivers that are shorter than the minimum link layer header length. Previously, packet sockets would drop packets smaller than or equal to dev->hard_header_len, but this has false positives. Zero length payload is used over Ethernet. Other link layer protocols support variable length headers. Support for validation of these protocols removed the min length check for all protocols. Introduce an explicit dev->min_header_len parameter and drop all packets below this value. Initially, set it to non-zero only for Ethernet and loopback. Other protocols can follow in a patch to net-next. Fixes: 9ed988cd5915 ("packet: validate variable length ll headers") Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08net: make ndo_get_stats64 a void functionstephen hemminger
The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03loopback: make use of NETIF_F_GSO_SOFTWAREMarcelo Ricardo Leitner
NETIF_F_GSO_SOFTWARE was defined to list all GSO software types, so lets make use of it in loopback code. Note that veth/vxlan/others already uses it. Within this patch series, this patch causes lo to pick up SCTP GSO feature automatically (as it's added to NETIF_F_GSO_SOFTWARE) and thus avoiding segmentation if possible. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-15sctp: Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRCTom Herbert
The SCTP checksum is really a CRC and is very different from the standards 1's complement checksum that serves as the checksum for IP protocols. This offload interface is also very different. Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC to highlight these differences. The term CSUM should be reserved in the stack to refer to the standard 1's complement IP checksum. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18net: loopback: convert to using IFF_NO_QUEUEPhil Sutter
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07net: better IFF_XMIT_DST_RELEASE supportEric Dumazet
Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15net: set name_assign_type in alloc_netdev()Tom Gundersen
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) | -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) | -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irqEric W. Biederman
Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-24loopback: sctp: add NETIF_F_SCTP_CSUM to device featuresDaniel Borkmann
Drivers are allowed to set NETIF_F_SCTP_CSUM if they have hardware crc32c checksumming support for the SCTP protocol. Currently, NETIF_F_SCTP_CSUM flag is available in igb, ixgbe, i40e/i40evf drivers and for vlan devices. If we don't have NETIF_F_SCTP_CSUM then crc32c is done through CPU instructions, invoked from crypto layer, or if not available as slow-path fallback in software. Currently, loopback device propagates checksum offloading feature flags in dev->features, but is missing SCTP checksum offloading. Therefore, account for NETIF_F_SCTP_CSUM as well. Before patch: ./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 4194304 4194304 4096 10.00 4683.50 After patch: ./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 4194304 4194304 4096 10.00 15348.26 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14net: introduce netdev_alloc_pcpu_stats() for driversWANG Cong
There are many drivers calling alloc_percpu() to allocate pcpu stats and then initializing ->syncp. So just introduce a helper function for them. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13net: allow setting mac address of loopback deviceWANG Cong
We are trying to mirror the local traffic from lo to eth0, allowing setting mac address of lo to eth0 would make the ether addresses in these packets correct, so that we don't have to modify the ether header again. Since usually no one cares about its mac address (all-zero), it is safe to allow those who care to set its mac address. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16drivers/net: delete non-required instances of include <linux/init.h>Paul Gortmaker
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. This covers everything under drivers/net except for wireless, which has been submitted separately. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-06net: Explicitly initialize u64_stats_sync structures for lockdepJohn Stultz
In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-17net loopback: Set loopback_dev to NULL when freedEric W. Biederman
It has recently turned up that we have a number of long standing bugs in the network stack cleanup code with use of the loopback device after it has been freed that have not turned up because in most cases the storage allocated to the loopback device is not reused, when those accesses happen. Set looback_dev to NULL to trigger oopses instead of silent data corrupt when we hit this class of bug. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27net: loopback: fix a dst refcounting issueEric Dumazet
Ben Greear reported crashes in ip_rcv_finish() on a stress test involving many macvlans. We tracked the bug to a dst use after free. ip_rcv_finish() was calling dst->input() and got garbage for dst->input value. It appears the bug is in loopback driver, lacking a skb_dst_force() before calling netif_rx(). As a result, a non refcounted dst, normally protected by a RCU read_lock section, was escaping this section and could be freed before the packet being processed. [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83 [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6 [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7 [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d [<ffffffff814ad90e>] ip_finish_output+0x63/0x68 [<ffffffff814ae412>] ip_output+0x61/0x67 [<ffffffff814ab904>] dst_output+0x17/0x1b [<ffffffff814adb6d>] ip_local_out+0x1e/0x23 [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353 [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b [<ffffffff814c3571>] tcp_connect+0x53c/0x587 [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431 [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287 [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49 [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11 [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79 [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49 [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49 [<ffffffff814632c6>] sys_connect+0x75/0x98 This bug was introduced in linux-2.6.35, in commit 7fee226ad2397b (net: add a noref bit on skb dst) skb_dst_force() is enforced in dev_queue_xmit() for devices having a qdisc. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24net: loopback: set default mtu to 64KEric Dumazet
loopback current mtu of 16436 bytes allows no more than 3 MSS TCP segments per frame, or 48 Kbytes. Changing mtu to 64K allows TCP stack to build large frames and significantly reduces stack overhead. Performance boost on bulk TCP transferts can be up to 30 %, partly because we now have one ACK message for two 64KB segments, and a lower probability of hitting /proc/sys/net/ipv4/tcp_reordering default limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09net: Loopback ifindex is constant nowPavel Emelyanov
As pointed out, there are places, that access net->loopback_dev->ifindex and after ifindex generation is made per-net this value becomes constant equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use it where appropriate. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22net: fix race condition in several drivers when reading statsKevin Groeneveld
Fix race condition in several network drivers when reading stats on 32bit UP architectures. These drivers update their stats in a BH context and therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the stats. Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-28Remove all #inclusions of asm/system.hDavid Howells
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2011-11-16net: remove NETIF_F_NO_CSUM feature bitMichał Mirosław
Only distinct use is checking if NETIF_F_NOCACHE_COPY should be enabled by default. The check heuristics is altered a bit here, so it hits other people than before. The default shouldn't be trusted for performance-critical cases anyway. For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08net: Allow ethtool to set interface in loopback mode.Mahesh Bandewar
This patch enables ethtool to set the loopback mode on a given interface. By configuring the interface in loopback mode in conjunction with a policy route / rule, a userland application can stress the egress / ingress path exposing the flows of the change in progress and potentially help developer(s) understand the impact of those changes without even sending a packet out on the network. Following set of commands illustrates one such example - a) ip -4 addr add 192.168.1.1/24 dev eth1 b) ip -4 rule add from all iif eth1 lookup 250 c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250 d) arp -Ds 192.168.1.100 eth1 e) arp -Ds 192.168.1.200 eth1 f) sysctl -w net.ipv4.ip_nonlocal_bind=1 g) sysctl -w net.ipv4.conf.all.accept_local=1 # Assuming that the machine has 8 cores h) taskset 000f netserver -L 192.168.1.200 i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30 Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-17ip6_pol_route panic: Do not allow VLAN on loopbackKrishna Kumar
Several tests in the ipv6 routing code check IFF_LOOPBACK, and allowing stacking such as VLAN'ing on top of loopback results in a netdevice which reports IFF_LOOPBACK but really isn't the loopback device. Instead of spamming the ipv6 routing code with even more special tests, simply disallow VLAN over loopback. The result of this patch is: # modprobe 8021q # vconfig add lo 43 ERROR: trying to add VLAN #43 to IF -:lo:- error: Operation not supported Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17loopback: convert to hw_featuresMichał Mirosław
This also enables TSOv6, TSO-ECN, and UFO as loopback clearly can handle them. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05net: add a core netdev->rx_dropped counterEric Dumazet
In various situations, a device provides a packet to our stack and we drop it before it enters protocol stack : - softnet backlog full (accounted in /proc/net/softnet_stat) - bad vlan tag (not accounted) - unknown/unregistered protocol (not accounted) We can handle a per-device counter of such dropped frames at core level, and automatically adds it to the device provided stats (rx_dropped), so that standard tools can be used (ifconfig, ip link, cat /proc/net/dev) This is a generalization of commit 8990f468a (net: rx_dropped accounting), thus reverting it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-26net: loopback driver cleanupEric Dumazet
loopback driver uses dev->ml_priv to store its percpu stats pointer. It uses ugly casts "(void __percpu __force *)" to shut up sparse complains. Define an union to better document we use ml_priv in loopback driver and define a lstats field with appropriate types. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-07net: fix 64 bit counters on 32 bit archesEric Dumazet
There is a small possibility that a reader gets incorrect values on 32 bit arches. SNMP applications could catch incorrect counters when a 32bit high part is changed by another stats consumer/provider. One way to solve this is to add a rtnl_link_stats64 param to all ndo_get_stats64() methods, and also add such a parameter to dev_get_stats(). Rule is that we are not allowed to use dev->stats64 as a temporary storage for 64bit stats, but a caller provided area (usually on stack) Old drivers (only providing get_stats() method) need no changes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-25loopback: use u64_stats_sync infrastructureEric Dumazet
Commit 6b10de38f0ef (loopback: Implement 64bit stats on 32bit arches) introduced 64bit stats in loopback driver, using a private seqcount and private helpers. David suggested to introduce a generic infrastructure, added in (net: Introduce u64_stats_sync infrastructure) This patch reimplements loopback 64bit stats using the u64_stats_sync infrastructure. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-14loopback: Implement 64bit stats on 32bit archesEric Dumazet
Uses a seqcount_t to synchronize stat producer and consumer, for packets and bytes counter, now u64 types. (dropped counter being rarely used, stay a native "unsigned long" type) No noticeable performance impact on x86, as it only adds two increments per frame. It might be more expensive on arches where smp_wmb() is not free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16percpu: add __percpu sparse annotations to net driversTejun Heo
Add __percpu sparse annotations to net drivers. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c
2009-12-01net: Simplify loopback and improve batching.Eric W. Biederman
Defer calling unregister_netdevice_queue to cleanup_net. It's simpler and it allows the loopback device to land in the same batch as other network devices. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-25net: use net_eq to compare netsOctavian Purdila
Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>