aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2012-08-06Bluetooth: Added /proc/net/sco via bt_procfs_init()Masatake YAMATO
Added /proc/net/sco via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Added /proc/net/rfcomm via bt_procfs_init()Masatake YAMATO
Added /proc/net/rfcomm via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Added /proc/net/l2cap via bt_procfs_init()Masatake YAMATO
Added /proc/net/l2cap via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Added /proc/net/hidp via bt_procfs_init()Masatake YAMATO
Added /proc/net/hidp via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Added /proc/net/hci via bt_procfs_init()Masatake YAMATO
Added /proc/net/hci via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Added /proc/net/cmtp via bt_procfs_init()Masatake YAMATO
Added /proc/net/cmtp via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Added /proc/net/bnep via bt_procfs_init()Masatake YAMATO
Added /proc/net/bnep via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: /proc/net/ entries for bluetooth protocolsMasatake YAMATO
lsof command can tell the type of socket processes are using. Internal lsof uses inode numbers on socket fs to resolve the type of sockets. Files under /proc/net/, such as tcp, udp, unix, etc provides such inode information. Unfortunately bluetooth related protocols don't provide such inode information. This patch series introduces /proc/net files for the protocols. This patch against af_bluetooth.c provides facility to the implementation of protocols. This patch extends bt_sock_list and introduces two exported function bt_procfs_init, bt_procfs_cleanup. The type bt_sock_list is already used in some of implementation of protocols. bt_procfs_init prepare seq_operations which converts protocol own bt_sock_list data to protocol own proc entry when the entry is accessed. What I, lsof user, need is just inode number of bluetooth socket. However, people may want more information. The bt_procfs_init takes a function pointer for customizing the show handler of seq_operations. In v4 patch, __acquires and __releases attributes are added to suppress sparse warning. Suggested by Andrei Emeltchenko. In v5 patch, linux/proc_fs.h is included to use PDE. Build error is reported by Fengguang Wu. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Free the l2cap channel list only when refcount is zeroJaganath Kanakkassery
Move the l2cap channel list chan->global_l under the refcnt protection and free it based on the refcnt. Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Move l2cap_chan_hold/put to l2cap_core.cJaganath Kanakkassery
Refactor the code in order to use the l2cap_chan_destroy() from l2cap_chan_put() under the refcnt protection. Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Remove locking in hci_user_passkey_request_evtAndre Guedes
This patch removes hdev locking in hci_user_passkey_request_evt since it is not needed. mgmt_user_passkey_request simply calls mgmt_event which does not require hdev locking at all. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Make connect / disconnect cfm functions return voidAndrei Emeltchenko
Return values are never used because callers hci_proto_connect_cfm and hci_proto_disconn_cfm return void. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Use lmp_no_flush_capable where applicableAndre Guedes
This patch replaces all LMP_NO_FLUSH bit checking by the helper macro lmp_no_flush_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Use lmp_sniffsubr_capable where applicableAndre Guedes
This patch replaces all LMP_SNIFF_SUBR bit checking by the helper macro lmp_sniffsubr_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Use lmp_sniff_capable where applicableAndre Guedes
This patch replaces all LMP_SNIFF bit checking by the helper macro lmp_sniff_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Use lmp_rswitch_capable where applicableAndre Guedes
This patch replaces all LMP_RSWITCH bit checking by the helper macro lmp_rswitch_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Use lmp_esco_capable where applicableAndre Guedes
This patch replaces all LMP_ESCO bit checking by the helper macro lmp_esco_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Use lmp_ssp_capable where applicableAndre Guedes
This patch replaces all LMP_SIMPLE_PAIR bit checking by the helper macro lmp_ssp_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Use lmp_le_capable where applicableAndre Guedes
This patch replaces all LMP_LE bit checking by the helper macro lmp_le_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Use lmp_bredr_capable where applicableAndre Guedes
This patch replaces all LMP_NO_BREDR bit checking by the helper macro lmp_bredr_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Fix processing A2MP chan in security_cfmAndrei Emeltchenko
Do not process A2MP channel in l2cap_security_cfm Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: Do not shadow hdr variableAndrei Emeltchenko
Fix compile warnings below: ... net/bluetooth/a2mp.c:505:33: warning: symbol 'hdr' shadows an earlier one net/bluetooth/a2mp.c:498:25: originally declared here ... Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: debug: Fix printing A2MP cmd code formatAndrei Emeltchenko
Print A2MP code format according to Bluetooth style. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-06Bluetooth: mgmt: Managing only BR/EDR HCI controllersAndrei Emeltchenko
Add check that HCI controller is BR/EDR. AMP controller shall not be managed by mgmt interface and consequently user space. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-07-26Merge branch 'for-3.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds Pull LED subsystem update from Bryan Wu. * 'for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds: (50 commits) leds-lp8788: forgotten unlock at lp8788_led_work LEDS: propagate error codes in blinkm_detect() LEDS: memory leak in blinkm_led_common_set() leds: add new lp8788 led driver LEDS: add BlinkM RGB LED driver, documentation and update MAINTAINERS leds: max8997: Simplify max8997_led_set_mode implementation leds/leds-s3c24xx: use devm_gpio_request leds: convert Network Space v2 LED driver to devm_kzalloc() and cleanup error exit path leds: convert DAC124S085 LED driver to devm_kzalloc() leds: convert LM3530 LED driver to devm_kzalloc() and cleanup error exit path leds: convert TCA6507 LED driver to devm_kzalloc() leds: convert Freescale MC13783 LED driver to devm_kzalloc() and cleanup error exit path leds: convert ADP5520 LED driver to devm_kzalloc() and cleanup error exit path leds: convert PCA955x LED driver to devm_kzalloc() and cleanup error exit path leds: convert Sun Fire LED driver to devm_kzalloc() and cleanup error exit path leds: convert PCA9532 LED driver to devm_kzalloc() leds: convert LT3593 LED driver to devm_kzalloc() leds: convert Renesas TPU LED driver to devm_kzalloc() and cleanup error exit path leds: convert LP5523 LED driver to devm_kzalloc() and cleanup error exit path leds: convert PCA9633 LED driver to devm_kzalloc() ...
2012-07-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking updates and fixes from David Miller: 1) Reinstate the no-ref optimization for input route lookups in ipv4 to fix some routing cache removal perf regressions. 2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet. 3) Get RX hash value from correct place in be2net driver, from Sarveshwar Bandi. 4) Validation of FIB cached routes missing critical check, from Eric Dumazet. 5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits) ipv6: Early TCP socket demux ipv4: Fix input route performance regression. pch_gbe: vlan skb len fix pch_gbe: add extra clean tx pch_gbe: fix transmit watchdog timeout ixgbe: fix panic while dumping packets on Tx hang with IOMMU be2net: Fix to parse RSS hash from Receive completions correctly. net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER hyperv: Add error handling to rndis_filter_device_add() hyperv: Add a check for ring_size value ipv4: rt_cache_valid must check expired routes net/pch_gpe: Cannot disable ethernet autonegation qeth: repair crash in qeth_l3_vlan_rx_kill_vid() netiucv: cleanup attribute usage net: wiznet add missing HAS_IOMEM dependency be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC mlx4: Add support for EEH error recovery cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN wanmain: comparing array with NULL caif: fix NULL pointer check ...
2012-07-26ipv6: Early TCP socket demuxEric Dumazet
This is the IPv6 missing bits for infrastructure added in commit 41063e9dd1195 (ipv4: Early TCP socket demux.) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26ipv4: Fix input route performance regression.David S. Miller
With the routing cache removal we lost the "noref" code paths on input, and this can kill some routing workloads. Reinstate the noref path when we hit a cached route in the FIB nexthops. With help from Eric Dumazet. Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-25ipv4: rt_cache_valid must check expired routesEric Dumazet
commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.) introduced rt_cache_valid() helper. It unfortunately doesn't check if route is expired before caching it. I noticed sk_setup_caps() was constantly called on a tcp workload. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-24wanmain: comparing array with NULLAlan Cox
gcc really should warn about these ! Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-24tcp: early_demux fixesEric Dumazet
1) Remove a non needed pskb_may_pull() in tcp_v4_early_demux() and fix a potential bug if skb->head was reallocated (iph & th pointers were not reloaded) TCP stack will pull/check headers anyway. 2) must reload iph in ip_rcv_finish() after early_demux() call since skb->head might have changed. 3) skb->dev->ifindex can be now replaced by skb->skb_iif Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Trivial updates all over the place as usual." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (29 commits) Fix typo in include/linux/clk.h . pci: hotplug: Fix typo in pci iommu: Fix typo in iommu video: Fix typo in drivers/video Documentation: Add newline at end-of-file to files lacking one arm,unicore32: Remove obsolete "select MISC_DEVICES" module.c: spelling s/postition/position/g cpufreq: Fix typo in cpufreq driver trivial: typo in comment in mksysmap mach-omap2: Fix typo in debug message and comment scsi: aha152x: Fix sparse warning and make printing pointer address more portable. Change email address for Steve Glendinning Btrfs: fix typo in convert_extent_bit via: Remove bogus if check netprio_cgroup.c: fix comment typo backlight: fix memory leak on obscure error path Documentation: asus-laptop.txt references an obsolete Kconfig item Documentation: ManagementStyle: fixed typo mm/vmscan: cleanup comment error in balance_pgdat mm: cleanup on the comments of zone_reclaim_stat ...
2012-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking changes from David S Miller: 1) Remove the ipv4 routing cache. Now lookups go directly into the FIB trie and use prebuilt routes cached there. No more garbage collection, no more rDOS attacks on the routing cache. Instead we now get predictable and consistent performance, no matter what the pattern of traffic we service. This has been almost 2 years in the making. Special thanks to Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who have helped along the way. I'm sure that with a change of this magnitude there will be some kind of fallout, but such things ought the be simple to fix at this point. Luckily I'm not European so I'll be around all of August to fix things :-) The major stages of this work here are each fronted by a forced merge commit whose commit message contains a top-level description of the motivations and implementation issues. 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on input. 3) TCP SYN/ACK performance tweaks from Eric Dumazet. 4) Add namespace support for netfilter L4 conntrack helpers, from Gao Feng. 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from Yuval Mintz. 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet. 7) Support for connection tracker helpers in userspace, from Pablo Neira Ayuso. 8) Allow userspace driven TX load balancing functions in TEAM driver, from Jiri Pirko. 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with embedded gotos. 10) TCP Small Queues, essentially minimize the amount of TCP data queued up in the packet scheduler layer. Whereas the existing BQL (Byte Queue Limits) limits the pkt_sched --> netdevice queuing levels, this controls the TCP --> pkt_sched queueing levels. From Eric Dumazet. 11) Reduce the number of get_page/put_page ops done on SKB fragments, from Alexander Duyck. 12) Implement protection against blind resets in TCP (RFC 5961), from Eric Dumazet. 13) Support the client side of TCP Fast Open, basically the ability to send data in the SYN exchange, from Yuchung Cheng. Basically, the sender queues up data with a sendmsg() call using MSG_FASTOPEN, then they do the connect() which emits the queued up fastopen data. 14) Avoid all the problems we get into in TCP when timers or PMTU events hit a locked socket. The TCP Small Queues changes added a tcp_release_cb() that allows us to queue work up to the release_sock() caller, and that's what we use here too. From Eric Dumazet. 15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits) genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP r8169: revert "add byte queue limit support". ipv4: Change rt->rt_iif encoding. net: Make skb->skb_iif always track skb->dev ipv4: Prepare for change of rt->rt_iif encoding. ipv4: Remove all RTCF_DIRECTSRC handliing. ipv4: Really ignore ICMP address requests/replies. decnet: Don't set RTCF_DIRECTSRC. net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. ipv4: Remove redundant assignment rds: set correct msg_namelen openvswitch: potential NULL deref in sample() tcp: dont drop MTU reduction indications bnx2x: Add new 57840 device IDs tcp: avoid oops in tcp_metrics and reset tcpm_stamp niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value niu: Fix to check for dma mapping errors. net: Fix references to out-of-scope variables in put_cmsg_compat() net: ethernet: davinci_emac: add pm_runtime support net: ethernet: davinci_emac: Remove unnecessary #include ...
2012-07-24genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEPWANG Cong
lockdep_is_held() is defined when CONFIG_LOCKDEP, not CONFIG_PROVE_LOCKING. Cc: "David S. Miller" <davem@davemloft.net> Cc: Jesse Gross <jesse@nicira.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-24leds: Rename led_brightness_set() to led_set_brightness()Shuah Khan
Rename leds external interface led_brightness_set() to led_set_brightness(). This is the second phase of the change to reduce confusion between the leds internal and external interfaces that set brightness. With this change, now the external interface is led_set_brightness(). The first phase renamed the internal interface led_set_brightness() to __led_set_brightness(). There are no changes to the interface implementations. Signed-off-by: Shuah Khan <shuahkhan@gmail.com> Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
2012-07-23ipv4: Change rt->rt_iif encoding.David S. Miller
On input packet processing, rt->rt_iif will be zero if we should use skb->dev->ifindex. Since we access rt->rt_iif consistently via inet_iif(), that is the only spot whose interpretation have to adjust. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23net: Make skb->skb_iif always track skb->devDavid S. Miller
Make it follow device decapsulation, from things such as VLAN and bonding. The stuff that actually cares about pre-demuxed device pointers, is handled by the "orig_dev" variable in __netif_receive_skb(). And the only consumer of that is the po->origdev feature of AF_PACKET sockets. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23ipv4: Prepare for change of rt->rt_iif encoding.David S. Miller
Use inet_iif() consistently, and for TCP record the input interface of cached RX dst in inet sock. rt->rt_iif is going to be encoded differently, so that we can legitimately cache input routes in the FIB info more aggressively. When the input interface is "use SKB device index" the rt->rt_iif will be set to zero. This forces us to move the TCP RX dst cache installation into the ipv4 specific code, and as well it should since doing the route caching for ipv6 is pointless at the moment since it is not inspected in the ipv6 input paths yet. Also, remove the unlikely on dst->obsolete, all ipv4 dsts have obsolete set to a non-zero value to force invocation of the check callback. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23ipv4: Remove all RTCF_DIRECTSRC handliing.David S. Miller
The last and final kernel user, ICMP address replies, has been removed. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23ipv4: Really ignore ICMP address requests/replies.David S. Miller
Alexey removed kernel side support for requests, and the only thing we do for replies is log a message if something doesn't look right. As Alexey's comment indicates, this belongs in userspace (if anywhere), and thus we can safely just get rid of this code. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23decnet: Don't set RTCF_DIRECTSRC.David S. Miller
It's an ipv4 defined route flag, and only ipv4 uses it. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.Saurabh
With CONFIG_SPARSE_RCU_POINTER=y sparse identified references which did not specificy __rcu in ip_vti.c Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23ipv4: Remove redundant assignmentLin Ming
It is redundant to set no_addr and accept_local to 0 and then set them with other values just after that. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull the big VFS changes from Al Viro: "This one is *big* and changes quite a few things around VFS. What's in there: - the first of two really major architecture changes - death to open intents. The former is finally there; it was very long in making, but with Miklos getting through really hard and messy final push in fs/namei.c, we finally have it. Unlike his variant, this one doesn't introduce struct opendata; what we have instead is ->atomic_open() taking preallocated struct file * and passing everything via its fields. Instead of returning struct file *, it returns -E... on error, 0 on success and 1 in "deal with it yourself" case (e.g. symlink found on server, etc.). See comments before fs/namei.c:atomic_open(). That made a lot of goodies finally possible and quite a few are in that pile: ->lookup(), ->d_revalidate() and ->create() do not get struct nameidata * anymore; ->lookup() and ->d_revalidate() get lookup flags instead, ->create() gets "do we want it exclusive" flag. With the introduction of new helper (kern_path_locked()) we are rid of all struct nameidata instances outside of fs/namei.c; it's still visible in namei.h, but not for long. Come the next cycle, declaration will move either to fs/internal.h or to fs/namei.c itself. [me, miklos, hch] - The second major change: behaviour of final fput(). Now we have __fput() done without any locks held by caller *and* not from deep in call stack. That obviously lifts a lot of constraints on the locking in there. Moreover, it's legal now to call fput() from atomic contexts (which has immediately simplified life for aio.c). We also don't need anti-recursion logics in __scm_destroy() anymore. There is a price, though - the damn thing has become partially asynchronous. For fput() from normal process we are guaranteed that pending __fput() will be done before the caller returns to userland, exits or gets stopped for ptrace. For kernel threads and atomic contexts it's done via schedule_work(), so theoretically we might need a way to make sure it's finished; so far only one such place had been found, but there might be more. There's flush_delayed_fput() (do all pending __fput()) and there's __fput_sync() (fput() analog doing __fput() immediately). I hope we won't need them often; see warnings in fs/file_table.c for details. [me, based on task_work series from Oleg merged last cycle] - sync series from Jan - large part of "death to sync_supers()" work from Artem; the only bits missing here are exofs and ext4 ones. As far as I understand, those are going via the exofs and ext4 trees resp.; once they are in, we can put ->write_super() to the rest, along with the thread calling it. - preparatory bits from unionmount series (from dhowells). - assorted cleanups and fixes all over the place, as usual. This is not the last pile for this cycle; there's at least jlayton's ESTALE work and fsfreeze series (the latter - in dire need of fixes, so I'm not sure it'll make the cut this cycle). I'll probably throw symlink/hardlink restrictions stuff from Kees into the next pile, too. Plus there's a lot of misc patches I hadn't thrown into that one - it's large enough as it is..." * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits) ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file() btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() switch dentry_open() to struct path, make it grab references itself spufs: shift dget/mntget towards dentry_open() zoran: don't bother with struct file * in zoran_map ecryptfs: don't reinvent the wheels, please - use struct completion don't expose I_NEW inodes via dentry->d_inode tidy up namei.c a bit unobfuscate follow_up() a bit ext3: pass custom EOF to generic_file_llseek_size() ext4: use core vfs llseek code for dir seeks vfs: allow custom EOF in generic_file_llseek code vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes vfs: Remove unnecessary flushing of block devices vfs: Make sys_sync writeout also block device inodes vfs: Create function for iterating over block devices vfs: Reorder operations during sys_sync quota: Move quota syncing to ->sync_fs method quota: Split dquot_quota_sync() to writeback and cache flushing part vfs: Move noop_backing_dev_info check from sync into writeback ...
2012-07-23rds: set correct msg_namelenWeiping Pan
Jay Fenlason (fenlason@redhat.com) found a bug, that recvfrom() on an RDS socket can return the contents of random kernel memory to userspace if it was called with a address length larger than sizeof(struct sockaddr_in). rds_recvmsg() also fails to set the addr_len paramater properly before returning, but that's just a bug. There are also a number of cases wher recvfrom() can return an entirely bogus address. Anything in rds_recvmsg() that returns a non-negative value but does not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path at the end of the while(1) loop will return up to 128 bytes of kernel memory to userspace. And I write two test programs to reproduce this bug, you will see that in rds_server, fromAddr will be overwritten and the following sock_fd will be destroyed. Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is better to make the kernel copy the real length of address to user space in such case. How to run the test programs ? I test them on 32bit x86 system, 3.5.0-rc7. 1 compile gcc -o rds_client rds_client.c gcc -o rds_server rds_server.c 2 run ./rds_server on one console 3 run ./rds_client on another console 4 you will see something like: server is waiting to receive data... old socket fd=3 server received data from client:data from client msg.msg_namelen=32 new socket fd=-1067277685 sendmsg() : Bad file descriptor /***************** rds_client.c ********************/ int main(void) { int sock_fd; struct sockaddr_in serverAddr; struct sockaddr_in toAddr; char recvBuffer[128] = "data from client"; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if (sock_fd < 0) { perror("create socket error\n"); exit(1); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4001); if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind() error\n"); close(sock_fd); exit(1); } memset(&toAddr, 0, sizeof(toAddr)); toAddr.sin_family = AF_INET; toAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); toAddr.sin_port = htons(4000); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = strlen(recvBuffer) + 1; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendto() error\n"); close(sock_fd); exit(1); } printf("client send data:%s\n", recvBuffer); memset(recvBuffer, '\0', 128); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("receive data from server:%s\n", recvBuffer); close(sock_fd); return 0; } /***************** rds_server.c ********************/ int main(void) { struct sockaddr_in fromAddr; int sock_fd; struct sockaddr_in serverAddr; unsigned int addrLen; char recvBuffer[128]; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if(sock_fd < 0) { perror("create socket error\n"); exit(0); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4000); if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind error\n"); close(sock_fd); exit(1); } printf("server is waiting to receive data...\n"); msg.msg_name = &fromAddr; /* * I add 16 to sizeof(fromAddr), ie 32, * and pay attention to the definition of fromAddr, * recvmsg() will overwrite sock_fd, * since kernel will copy 32 bytes to userspace. * * If you just use sizeof(fromAddr), it works fine. * */ msg.msg_namelen = sizeof(fromAddr) + 16; /* msg.msg_namelen = sizeof(fromAddr); */ msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; while (1) { printf("old socket fd=%d\n", sock_fd); if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("server received data from client:%s\n", recvBuffer); printf("msg.msg_namelen=%d\n", msg.msg_namelen); printf("new socket fd=%d\n", sock_fd); strcat(recvBuffer, "--data from server"); if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendmsg()\n"); close(sock_fd); exit(1); } } close(sock_fd); return 0; } Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23openvswitch: potential NULL deref in sample()Dan Carpenter
If there is no OVS_SAMPLE_ATTR_ACTIONS set then "acts_list" is NULL and it leads to a NULL dereference when we call nla_len(acts_list). This is a static checker fix, not something I have seen in testing. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23tcp: dont drop MTU reduction indicationsEric Dumazet
ICMP messages generated in output path if frame length is bigger than mtu are actually lost because socket is owned by user (doing the xmit) One example is the ipgre_tunnel_xmit() calling icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); We had a similar case fixed in commit a34a101e1e6 (ipv6: disable GSO on sockets hitting dst_allfrag). Problem of such fix is that it relied on retransmit timers, so short tcp sessions paid a too big latency increase price. This patch uses the tcp_release_cb() infrastructure so that MTU reduction messages (ICMP messages) are not lost, and no extra delay is added in TCP transmits. Reported-by: Maciej Żenczykowski <maze@google.com> Diagnosed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23tcp: avoid oops in tcp_metrics and reset tcpm_stampJulian Anastasov
In tcp_tw_remember_stamp we incorrectly checked tw instead of tm, it can lead to oops if the cached entry is not found. tcpm_stamp was not updated in tcpm_check_stamp when tcpm_suck_dst was called, move the update into tcpm_suck_dst, so that we do not call it infinitely on every next cache hit after TCP_METRICS_TIMEOUT. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22net: Fix references to out-of-scope variables in put_cmsg_compat()Jesper Juhl
In net/compat.c::put_cmsg_compat() we may assign 'data' the address of either the 'ctv' or 'cts' local variables inside the 'if (!COMPAT_USE_64BIT_TIME)' branch. Those variables go out of scope at the end of the 'if' statement, so when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm), data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what it may be refering to - not good. Fix the problem by simply giving 'ctv' and 'cts' function scope. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22Merge branch 'kill_rtcache'David S. Miller
The ipv4 routing cache is non-deterministic, performance wise, and is subject to reasonably easy to launch denial of service attacks. The routing cache works great for well behaved traffic, and the world was a much friendlier place when the tradeoffs that led to the routing cache's design were considered. What it boils down to is that the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables. The former of which is controllable by external entitites. Even for "well behaved" legitimate traffic, high volume sites can see hit rates in the routing cache of only ~%10. The general flow of this patch series is that first the routing cache is removed. We build a completely new rtable entry every lookup request. Next we make some simplifications due to the fact that removing the routing cache causes several members of struct rtable to become no longer necessary. Then we need to make some amends such that we can legally cache pre-constructed routes in the FIB nexthops. Firstly, we need to invalidate routes which are hit with nexthop exceptions. Secondly we have to change the semantics of rt->rt_gateway such that zero means that the destination is on-link and non-zero otherwise. Now that the preparations are ready, we start caching precomputed routes in the FIB nexthops. Output and input routes need different kinds of care when determining if we can legally do such caching or not. The details are in the commit log messages for those changes. The patch series then winds down with some more struct rtable simplifications and other tidy ups that remove unnecessary overhead. On a SPARC-T3 output route lookups are ~876 cycles. Input route lookups are ~1169 cycles with rpfilter disabled, and about ~1468 cycles with rpfilter enabled. These measurements were taken with the kbench_mod test module in the net_test_tools GIT tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git That GIT tree also includes a udpflood tester tool and stresses route lookups on packet output. For example, on the same SPARC-T3 system we can run: time ./udpflood -l 10000000 10.2.2.11 with routing cache: real 1m21.955s user 0m6.530s sys 1m15.390s without routing cache: real 1m31.678s user 0m6.520s sys 1m25.140s Performance undoubtedly can easily be improved further. For example fib_table_lookup() performs a lot of excessive computations with all the masking and shifting, some of it conditionalized to deal with edge cases. Also, Eric's no-ref optimization for input route lookups can be re-instated for the FIB nexthop caching code path. I would be really pleased if someone would work on that. In fact anyone suitable motivated can just fire up perf on the loading of the test net_test_tools benchmark kernel module. I spend much of my time going: bash# perf record insmod ./kbench_mod.ko dst=172.30.42.22 src=74.128.0.1 iif=2 bash# perf report Thanks to helpful feedback from Joe Perches, Eric Dumazet, Ben Hutchings, and others. Signed-off-by: David S. Miller <davem@davemloft.net>