From f7ab97f78a5c573e49474edbd260ea6898ddccda Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Fri, 16 Nov 2007 16:09:28 -0800 Subject: [CAN]: Add documentation This patch adds documentation for the PF_CAN protocol family. Signed-off-by: Oliver Hartkopp Signed-off-by: Urs Thuermann Signed-off-by: David S. Miller --- Documentation/networking/00-INDEX | 2 + Documentation/networking/can.txt | 629 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 631 insertions(+) create mode 100644 Documentation/networking/can.txt (limited to 'Documentation') diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 563e442f2d4..b4eefadb9ad 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -24,6 +24,8 @@ baycom.txt - info on the driver for Baycom style amateur radio modems bridge.txt - where to get user space programs for ethernet bridging with Linux. +can.txt + - documentation on CAN protocol family. cops.txt - info on the COPS LocalTalk Linux driver cs89x0.txt diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt new file mode 100644 index 00000000000..f1b2de17092 --- /dev/null +++ b/Documentation/networking/can.txt @@ -0,0 +1,629 @@ +============================================================================ + +can.txt + +Readme file for the Controller Area Network Protocol Family (aka Socket CAN) + +This file contains + + 1 Overview / What is Socket CAN + + 2 Motivation / Why using the socket API + + 3 Socket CAN concept + 3.1 receive lists + 3.2 local loopback of sent frames + 3.3 network security issues (capabilities) + 3.4 network problem notifications + + 4 How to use Socket CAN + 4.1 RAW protocol sockets with can_filters (SOCK_RAW) + 4.1.1 RAW socket option CAN_RAW_FILTER + 4.1.2 RAW socket option CAN_RAW_ERR_FILTER + 4.1.3 RAW socket option CAN_RAW_LOOPBACK + 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS + 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) + 4.3 connected transport protocols (SOCK_SEQPACKET) + 4.4 unconnected transport protocols (SOCK_DGRAM) + + 5 Socket CAN core module + 5.1 can.ko module params + 5.2 procfs content + 5.3 writing own CAN protocol modules + + 6 CAN network drivers + 6.1 general settings + 6.2 local loopback of sent frames + 6.3 CAN controller hardware filters + 6.4 currently supported CAN hardware + 6.5 todo + + 7 Credits + +============================================================================ + +1. Overview / What is Socket CAN +-------------------------------- + +The socketcan package is an implementation of CAN protocols +(Controller Area Network) for Linux. CAN is a networking technology +which has widespread use in automation, embedded devices, and +automotive fields. While there have been other CAN implementations +for Linux based on character devices, Socket CAN uses the Berkeley +socket API, the Linux network stack and implements the CAN device +drivers as network interfaces. The CAN socket API has been designed +as similar as possible to the TCP/IP protocols to allow programmers, +familiar with network programming, to easily learn how to use CAN +sockets. + +2. Motivation / Why using the socket API +---------------------------------------- + +There have been CAN implementations for Linux before Socket CAN so the +question arises, why we have started another project. Most existing +implementations come as a device driver for some CAN hardware, they +are based on character devices and provide comparatively little +functionality. Usually, there is only a hardware-specific device +driver which provides a character device interface to send and +receive raw CAN frames, directly to/from the controller hardware. +Queueing of frames and higher-level transport protocols like ISO-TP +have to be implemented in user space applications. Also, most +character-device implementations support only one single process to +open the device at a time, similar to a serial interface. Exchanging +the CAN controller requires employment of another device driver and +often the need for adaption of large parts of the application to the +new driver's API. + +Socket CAN was designed to overcome all of these limitations. A new +protocol family has been implemented which provides a socket interface +to user space applications and which builds upon the Linux network +layer, so to use all of the provided queueing functionality. A device +driver for CAN controller hardware registers itself with the Linux +network layer as a network device, so that CAN frames from the +controller can be passed up to the network layer and on to the CAN +protocol family module and also vice-versa. Also, the protocol family +module provides an API for transport protocol modules to register, so +that any number of transport protocols can be loaded or unloaded +dynamically. In fact, the can core module alone does not provide any +protocol and cannot be used without loading at least one additional +protocol module. Multiple sockets can be opened at the same time, +on different or the same protocol module and they can listen/send +frames on different or the same CAN IDs. Several sockets listening on +the same interface for frames with the same CAN ID are all passed the +same received matching CAN frames. An application wishing to +communicate using a specific transport protocol, e.g. ISO-TP, just +selects that protocol when opening the socket, and then can read and +write application data byte streams, without having to deal with +CAN-IDs, frames, etc. + +Similar functionality visible from user-space could be provided by a +character device, too, but this would lead to a technically inelegant +solution for a couple of reasons: + +* Intricate usage. Instead of passing a protocol argument to + socket(2) and using bind(2) to select a CAN interface and CAN ID, an + application would have to do all these operations using ioctl(2)s. + +* Code duplication. A character device cannot make use of the Linux + network queueing code, so all that code would have to be duplicated + for CAN networking. + +* Abstraction. In most existing character-device implementations, the + hardware-specific device driver for a CAN controller directly + provides the character device for the application to work with. + This is at least very unusual in Unix systems for both, char and + block devices. For example you don't have a character device for a + certain UART of a serial interface, a certain sound chip in your + computer, a SCSI or IDE controller providing access to your hard + disk or tape streamer device. Instead, you have abstraction layers + which provide a unified character or block device interface to the + application on the one hand, and a interface for hardware-specific + device drivers on the other hand. These abstractions are provided + by subsystems like the tty layer, the audio subsystem or the SCSI + and IDE subsystems for the devices mentioned above. + + The easiest way to implement a CAN device driver is as a character + device without such a (complete) abstraction layer, as is done by most + existing drivers. The right way, however, would be to add such a + layer with all the functionality like registering for certain CAN + IDs, supporting several open file descriptors and (de)multiplexing + CAN frames between them, (sophisticated) queueing of CAN frames, and + providing an API for device drivers to register with. However, then + it would be no more difficult, or may be even easier, to use the + networking framework provided by the Linux kernel, and this is what + Socket CAN does. + + The use of the networking framework of the Linux kernel is just the + natural and most appropriate way to implement CAN for Linux. + +3. Socket CAN concept +--------------------- + + As described in chapter 2 it is the main goal of Socket CAN to + provide a socket interface to user space applications which builds + upon the Linux network layer. In contrast to the commonly known + TCP/IP and ethernet networking, the CAN bus is a broadcast-only(!) + medium that has no MAC-layer addressing like ethernet. The CAN-identifier + (can_id) is used for arbitration on the CAN-bus. Therefore the CAN-IDs + have to be chosen uniquely on the bus. When designing a CAN-ECU + network the CAN-IDs are mapped to be sent by a specific ECU. + For this reason a CAN-ID can be treated best as a kind of source address. + + 3.1 receive lists + + The network transparent access of multiple applications leads to the + problem that different applications may be interested in the same + CAN-IDs from the same CAN network interface. The Socket CAN core + module - which implements the protocol family CAN - provides several + high efficient receive lists for this reason. If e.g. a user space + application opens a CAN RAW socket, the raw protocol module itself + requests the (range of) CAN-IDs from the Socket CAN core that are + requested by the user. The subscription and unsubscription of + CAN-IDs can be done for specific CAN interfaces or for all(!) known + CAN interfaces with the can_rx_(un)register() functions provided to + CAN protocol modules by the SocketCAN core (see chapter 5). + To optimize the CPU usage at runtime the receive lists are split up + into several specific lists per device that match the requested + filter complexity for a given use-case. + + 3.2 local loopback of sent frames + + As known from other networking concepts the data exchanging + applications may run on the same or different nodes without any + change (except for the according addressing information): + + ___ ___ ___ _______ ___ + | _ | | _ | | _ | | _ _ | | _ | + ||A|| ||B|| ||C|| ||A| |B|| ||C|| + |___| |___| |___| |_______| |___| + | | | | | + -----------------(1)- CAN bus -(2)--------------- + + To ensure that application A receives the same information in the + example (2) as it would receive in example (1) there is need for + some kind of local loopback of the sent CAN frames on the appropriate + node. + + The Linux network devices (by default) just can handle the + transmission and reception of media dependent frames. Due to the + arbritration on the CAN bus the transmission of a low prio CAN-ID + may be delayed by the reception of a high prio CAN frame. To + reflect the correct* traffic on the node the loopback of the sent + data has to be performed right after a successful transmission. If + the CAN network interface is not capable of performing the loopback for + some reason the SocketCAN core can do this task as a fallback solution. + See chapter 6.2 for details (recommended). + + The loopback functionality is enabled by default to reflect standard + networking behaviour for CAN applications. Due to some requests from + the RT-SocketCAN group the loopback optionally may be disabled for each + separate socket. See sockopts from the CAN RAW sockets in chapter 4.1. + + * = you really like to have this when you're running analyser tools + like 'candump' or 'cansniffer' on the (same) node. + + 3.3 network security issues (capabilities) + + The Controller Area Network is a local field bus transmitting only + broadcast messages without any routing and security concepts. + In the majority of cases the user application has to deal with + raw CAN frames. Therefore it might be reasonable NOT to restrict + the CAN access only to the user root, as known from other networks. + Since the currently implemented CAN_RAW and CAN_BCM sockets can only + send and receive frames to/from CAN interfaces it does not affect + security of others networks to allow all users to access the CAN. + To enable non-root users to access CAN_RAW and CAN_BCM protocol + sockets the Kconfig options CAN_RAW_USER and/or CAN_BCM_USER may be + selected at kernel compile time. + + 3.4 network problem notifications + + The use of the CAN bus may lead to several problems on the physical + and media access control layer. Detecting and logging of these lower + layer problems is a vital requirement for CAN users to identify + hardware issues on the physical transceiver layer as well as + arbitration problems and error frames caused by the different + ECUs. The occurrence of detected errors are important for diagnosis + and have to be logged together with the exact timestamp. For this + reason the CAN interface driver can generate so called Error Frames + that can optionally be passed to the user application in the same + way as other CAN frames. Whenever an error on the physical layer + or the MAC layer is detected (e.g. by the CAN controller) the driver + creates an appropriate error frame. Error frames can be requested by + the user application using the common CAN filter mechanisms. Inside + this filter definition the (interested) type of errors may be + selected. The reception of error frames is disabled by default. + +4. How to use Socket CAN +------------------------ + + Like TCP/IP, you first need to open a socket for communicating over a + CAN network. Since Socket CAN implements a new protocol family, you + need to pass PF_CAN as the first argument to the socket(2) system + call. Currently, there are two CAN protocols to choose from, the raw + socket protocol and the broadcast manager (BCM). So to open a socket, + you would write + + s = socket(PF_CAN, SOCK_RAW, CAN_RAW); + + and + + s = socket(PF_CAN, SOCK_DGRAM, CAN_BCM); + + respectively. After the successful creation of the socket, you would + normally use the bind(2) system call to bind the socket to a CAN + interface (which is different from TCP/IP due to different addressing + - see chapter 3). After binding (CAN_RAW) or connecting (CAN_BCM) + the socket, you can read(2) and write(2) from/to the socket or use + send(2), sendto(2), sendmsg(2) and the recv* counterpart operations + on the socket as usual. There are also CAN specific socket options + described below. + + The basic CAN frame structure and the sockaddr structure are defined + in include/linux/can.h: + + struct can_frame { + canid_t can_id; /* 32 bit CAN_ID + EFF/RTR/ERR flags */ + __u8 can_dlc; /* data length code: 0 .. 8 */ + __u8 data[8] __attribute__((aligned(8))); + }; + + The alignment of the (linear) payload data[] to a 64bit boundary + allows the user to define own structs and unions to easily access the + CAN payload. There is no given byteorder on the CAN bus by + default. A read(2) system call on a CAN_RAW socket transfers a + struct can_frame to the user space. + + The sockaddr_can structure has an interface index like the + PF_PACKET socket, that also binds to a specific interface: + + struct sockaddr_can { + sa_family_t can_family; + int can_ifindex; + union { + struct { canid_t rx_id, tx_id; } tp16; + struct { canid_t rx_id, tx_id; } tp20; + struct { canid_t rx_id, tx_id; } mcnet; + struct { canid_t rx_id, tx_id; } isotp; + } can_addr; + }; + + To determine the interface index an appropriate ioctl() has to + be used (example for CAN_RAW sockets without error checking): + + int s; + struct sockaddr_can addr; + struct ifreq ifr; + + s = socket(PF_CAN, SOCK_RAW, CAN_RAW); + + strcpy(ifr.ifr_name, "can0" ); + ioctl(s, SIOCGIFINDEX, &ifr); + + addr.can_family = AF_CAN; + addr.can_ifindex = ifr.ifr_ifindex; + + bind(s, (struct sockaddr *)&addr, sizeof(addr)); + + (..) + + To bind a socket to all(!) CAN interfaces the interface index must + be 0 (zero). In this case the socket receives CAN frames from every + enabled CAN interface. To determine the originating CAN interface + the system call recvfrom(2) may be used instead of read(2). To send + on a socket that is bound to 'any' interface sendto(2) is needed to + specify the outgoing interface. + + Reading CAN frames from a bound CAN_RAW socket (see above) consists + of reading a struct can_frame: + + struct can_frame frame; + + nbytes = read(s, &frame, sizeof(struct can_frame)); + + if (nbytes < 0) { + perror("can raw socket read"); + return 1; + } + + /* paraniod check ... */ + if (nbytes < sizeof(struct can_frame)) { + fprintf(stderr, "read: incomplete CAN frame\n"); + return 1; + } + + /* do something with the received CAN frame */ + + Writing CAN frames can be done similarly, with the write(2) system call: + + nbytes = write(s, &frame, sizeof(struct can_frame)); + + When the CAN interface is bound to 'any' existing CAN interface + (addr.can_ifindex = 0) it is recommended to use recvfrom(2) if the + information about the originating CAN interface is needed: + + struct sockaddr_can addr; + struct ifreq ifr; + socklen_t len = sizeof(addr); + struct can_frame frame; + + nbytes = recvfrom(s, &frame, sizeof(struct can_frame), + 0, (struct sockaddr*)&addr, &len); + + /* get interface name of the received CAN frame */ + ifr.ifr_ifindex = addr.can_ifindex; + ioctl(s, SIOCGIFNAME, &ifr); + printf("Received a CAN frame from interface %s", ifr.ifr_name); + + To write CAN frames on sockets bound to 'any' CAN interface the + outgoing interface has to be defined certainly. + + strcpy(ifr.ifr_name, "can0"); + ioctl(s, SIOCGIFINDEX, &ifr); + addr.can_ifindex = ifr.ifr_ifindex; + addr.can_family = AF_CAN; + + nbytes = sendto(s, &frame, sizeof(struct can_frame), + 0, (struct sockaddr*)&addr, sizeof(addr)); + + 4.1 RAW protocol sockets with can_filters (SOCK_RAW) + + Using CAN_RAW sockets is extensively comparable to the commonly + known access to CAN character devices. To meet the new possibilities + provided by the multi user SocketCAN approach, some reasonable + defaults are set at RAW socket binding time: + + - The filters are set to exactly one filter receiving everything + - The socket only receives valid data frames (=> no error frames) + - The loopback of sent CAN frames is enabled (see chapter 3.2) + - The socket does not receive its own sent frames (in loopback mode) + + These default settings may be changed before or after binding the socket. + To use the referenced definitions of the socket options for CAN_RAW + sockets, include . + + 4.1.1 RAW socket option CAN_RAW_FILTER + + The reception of CAN frames using CAN_RAW sockets can be controlled + by defining 0 .. n filters with the CAN_RAW_FILTER socket option. + + The CAN filter structure is defined in include/linux/can.h: + + struct can_filter { + canid_t can_id; + canid_t can_mask; + }; + + A filter matches, when + + & mask == can_id & mask + + which is analogous to known CAN controllers hardware filter semantics. + The filter can be inverted in this semantic, when the CAN_INV_FILTER + bit is set in can_id element of the can_filter structure. In + contrast to CAN controller hardware filters the user may set 0 .. n + receive filters for each open socket separately: + + struct can_filter rfilter[2]; + + rfilter[0].can_id = 0x123; + rfilter[0].can_mask = CAN_SFF_MASK; + rfilter[1].can_id = 0x200; + rfilter[1].can_mask = 0x700; + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_FILTER, &rfilter, sizeof(rfilter)); + + To disable the reception of CAN frames on the selected CAN_RAW socket: + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_FILTER, NULL, 0); + + To set the filters to zero filters is quite obsolete as not read + data causes the raw socket to discard the received CAN frames. But + having this 'send only' use-case we may remove the receive list in the + Kernel to save a little (really a very little!) CPU usage. + + 4.1.2 RAW socket option CAN_RAW_ERR_FILTER + + As described in chapter 3.4 the CAN interface driver can generate so + called Error Frames that can optionally be passed to the user + application in the same way as other CAN frames. The possible + errors are divided into different error classes that may be filtered + using the appropriate error mask. To register for every possible + error condition CAN_ERR_MASK can be used as value for the error mask. + The values for the error mask are defined in linux/can/error.h . + + can_err_mask_t err_mask = ( CAN_ERR_TX_TIMEOUT | CAN_ERR_BUSOFF ); + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_ERR_FILTER, + &err_mask, sizeof(err_mask)); + + 4.1.3 RAW socket option CAN_RAW_LOOPBACK + + To meet multi user needs the local loopback is enabled by default + (see chapter 3.2 for details). But in some embedded use-cases + (e.g. when only one application uses the CAN bus) this loopback + functionality can be disabled (separately for each socket): + + int loopback = 0; /* 0 = disabled, 1 = enabled (default) */ + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_LOOPBACK, &loopback, sizeof(loopback)); + + 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS + + When the local loopback is enabled, all the sent CAN frames are + looped back to the open CAN sockets that registered for the CAN + frames' CAN-ID on this given interface to meet the multi user + needs. The reception of the CAN frames on the same socket that was + sending the CAN frame is assumed to be unwanted and therefore + disabled by default. This default behaviour may be changed on + demand: + + int recv_own_msgs = 1; /* 0 = disabled (default), 1 = enabled */ + + setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, + &recv_own_msgs, sizeof(recv_own_msgs)); + + 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) + 4.3 connected transport protocols (SOCK_SEQPACKET) + 4.4 unconnected transport protocols (SOCK_DGRAM) + + +5. Socket CAN core module +------------------------- + + The Socket CAN core module implements the protocol family + PF_CAN. CAN protocol modules are loaded by the core module at + runtime. The core module provides an interface for CAN protocol + modules to subscribe needed CAN IDs (see chapter 3.1). + + 5.1 can.ko module params + + - stats_timer: To calculate the Socket CAN core statistics + (e.g. current/maximum frames per second) this 1 second timer is + invoked at can.ko module start time by default. This timer can be + disabled by using stattimer=0 on the module comandline. + + - debug: (removed since SocketCAN SVN r546) + + 5.2 procfs content + + As described in chapter 3.1 the Socket CAN core uses several filter + lists to deliver received CAN frames to CAN protocol modules. These + receive lists, their filters and the count of filter matches can be + checked in the appropriate receive list. All entries contain the + device and a protocol module identifier: + + foo@bar:~$ cat /proc/net/can/rcvlist_all + + receive list 'rx_all': + (vcan3: no entry) + (vcan2: no entry) + (vcan1: no entry) + device can_id can_mask function userdata matches ident + vcan0 000 00000000 f88e6370 f6c6f400 0 raw + (any: no entry) + + In this example an application requests any CAN traffic from vcan0. + + rcvlist_all - list for unfiltered entries (no filter operations) + rcvlist_eff - list for single extended frame (EFF) entries + rcvlist_err - list for error frames masks + rcvlist_fil - list for mask/value filters + rcvlist_inv - list for mask/value filters (inverse semantic) + rcvlist_sff - list for single standard frame (SFF) entries + + Additional procfs files in /proc/net/can + + stats - Socket CAN core statistics (rx/tx frames, match ratios, ...) + reset_stats - manual statistic reset + version - prints the Socket CAN core version and the ABI version + + 5.3 writing own CAN protocol modules + + To implement a new protocol in the protocol family PF_CAN a new + protocol has to be defined in include/linux/can.h . + The prototypes and definitions to use the Socket CAN core can be + accessed by including include/linux/can/core.h . + In addition to functions that register the CAN protocol and the + CAN device notifier chain there are functions to subscribe CAN + frames received by CAN interfaces and to send CAN frames: + + can_rx_register - subscribe CAN frames from a specific interface + can_rx_unregister - unsubscribe CAN frames from a specific interface + can_send - transmit a CAN frame (optional with local loopback) + + For details see the kerneldoc documentation in net/can/af_can.c or + the source code of net/can/raw.c or net/can/bcm.c . + +6. CAN network drivers +---------------------- + + Writing a CAN network device driver is much easier than writing a + CAN character device driver. Similar to other known network device + drivers you mainly have to deal with: + + - TX: Put the CAN frame from the socket buffer to the CAN controller. + - RX: Put the CAN frame from the CAN controller to the socket buffer. + + See e.g. at Documentation/networking/netdevices.txt . The differences + for writing CAN network device driver are described below: + + 6.1 general settings + + dev->type = ARPHRD_CAN; /* the netdevice hardware type */ + dev->flags = IFF_NOARP; /* CAN has no arp */ + + dev->mtu = sizeof(struct can_frame); + + The struct can_frame is the payload of each socket buffer in the + protocol family PF_CAN. + + 6.2 local loopback of sent frames + + As described in chapter 3.2 the CAN network device driver should + support a local loopback functionality similar to the local echo + e.g. of tty devices. In this case the driver flag IFF_ECHO has to be + set to prevent the PF_CAN core from locally echoing sent frames + (aka loopback) as fallback solution: + + dev->flags = (IFF_NOARP | IFF_ECHO); + + 6.3 CAN controller hardware filters + + To reduce the interrupt load on deep embedded systems some CAN + controllers support the filtering of CAN IDs or ranges of CAN IDs. + These hardware filter capabilities vary from controller to + controller and have to be identified as not feasible in a multi-user + networking approach. The use of the very controller specific + hardware filters could make sense in a very dedicated use-case, as a + filter on driver level would affect all users in the multi-user + system. The high efficient filter sets inside the PF_CAN core allow + to set different multiple filters for each socket separately. + Therefore the use of hardware filters goes to the category 'handmade + tuning on deep embedded systems'. The author is running a MPC603e + @133MHz with four SJA1000 CAN controllers from 2002 under heavy bus + load without any problems ... + + 6.4 currently supported CAN hardware (September 2007) + + On the project website http://developer.berlios.de/projects/socketcan + there are different drivers available: + + vcan: Virtual CAN interface driver (if no real hardware is available) + sja1000: Philips SJA1000 CAN controller (recommended) + i82527: Intel i82527 CAN controller + mscan: Motorola/Freescale CAN controller (e.g. inside SOC MPC5200) + ccan: CCAN controller core (e.g. inside SOC h7202) + slcan: For a bunch of CAN adaptors that are attached via a + serial line ASCII protocol (for serial / USB adaptors) + + Additionally the different CAN adaptors (ISA/PCI/PCMCIA/USB/Parport) + from PEAK Systemtechnik support the CAN netdevice driver model + since Linux driver v6.0: http://www.peak-system.com/linux/index.htm + + Please check the Mailing Lists on the berlios OSS project website. + + 6.5 todo (September 2007) + + The configuration interface for CAN network drivers is still an open + issue that has not been finalized in the socketcan project. Also the + idea of having a library module (candev.ko) that holds functions + that are needed by all CAN netdevices is not ready to ship. + Your contribution is welcome. + +7. Credits +---------- + + Oliver Hartkopp (PF_CAN core, filters, drivers, bcm) + Urs Thuermann (PF_CAN core, kernel integration, socket interfaces, raw, vcan) + Jan Kizka (RT-SocketCAN core, Socket-API reconciliation) + Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews) + Robert Schwebel (design reviews, PTXdist integration) + Marc Kleine-Budde (design reviews, Kernel 2.6 cleanups, drivers) + Benedikt Spranger (reviews) + Thomas Gleixner (LKML reviews, coding style, posting hints) + Andrey Volkov (kernel subtree structure, ioctls, mscan driver) + Matthias Brukner (first SJA1000 CAN netdevice implementation Q2/2003) + Klaus Hitschler (PEAK driver integration) + Uwe Koppe (CAN netdevices with PF_PACKET approach) + Michael Schulze (driver layer loopback requirement, RT CAN drivers review) -- cgit v1.2.3 From 8e8c71f1ab0ca1c4e74efad14533b991524dcb6c Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Wed, 21 Nov 2007 09:56:48 -0200 Subject: [DCCP]: Honour and make use of shutdown option set by user This extends the DCCP socket API by honouring any shutdown(2) option set by the user. The behaviour is, as much as possible, made consistent with the API for TCP's shutdown. This patch exploits the information provided by the user via the socket API to reduce processing costs: * if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID; * if the write end is closed (SHUT_WR), the same idea applies, but with a difference - as long as the TX queue has not been drained, we need to receive feedback to keep congestion-control rates up to date. Hence SHUT_WR is honoured only after the last packet (under congestion control) has been sent; * although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c). Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added to this by the present patch. There will also no longer be any delivery when the socket is in the final stages, i.e. when one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at that stage the connection is its final stages. Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7). Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make sure the socket is a TCP socket". This should probably be extended to mean `TCP or DCCP socket' (the code is also used by UDP and raw sockets). Signed-off-by: Gerrit Renker Signed-off-by: Ian McDonald Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller --- Documentation/networking/dccp.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index afb66f9a8af..f771034e8f2 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -72,6 +72,8 @@ DCCP_SOCKOPT_CCID_TX_INFO Returns a `struct tfrc_tx_info' in optval; the buffer for optval and optlen must be set to at least sizeof(struct tfrc_tx_info). +On unidirectional connections it is useful to close the unused half-connection +via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. Sysctl variables ================ -- cgit v1.2.3 From ebe6f7e73c3efec1de295205806b4550fcb468cd Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Wed, 21 Nov 2007 10:00:17 -0200 Subject: [DCCP]: Update documentation This updates the DCCP documentation, following input from Ian McDonald, clarifiying the status of DCCP, and adding a note about the test tree. Signed-off-by: Gerrit Renker Signed-off-by: Ian McDonald Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller --- Documentation/networking/dccp.txt | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index f771034e8f2..fdc93beec05 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -19,19 +19,23 @@ for real time and multimedia traffic. It has a base protocol and pluggable congestion control IDs (CCIDs). -It is at proposed standard RFC status and the homepage for DCCP as a protocol -is at: - http://www.read.cs.ucla.edu/dccp/ +DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol +is at http://www.ietf.org/html.charters/dccp-charter.html Missing features ================ -The DCCP implementation does not currently have all the features that are in -the RFC. +The Linux DCCP implementation does not currently support all the features that are +specified in RFCs 4340...42. The known bugs are at: http://linux-net.osdl.org/index.php/TODO#DCCP +For more up-to-date versions of the DCCP implementation, please consider using +the experimental DCCP test tree; instructions for checking this out are on: +http://linux-net.osdl.org/index.php/DCCP_Testing#Experimental_DCCP_source_tree + + Socket options ============== -- cgit v1.2.3 From e333b3edc489151afda2a4f6c798842c64cb67a4 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Wed, 21 Nov 2007 10:09:56 -0200 Subject: [DCCP]: Promote CCID2 as default CCID This patch addresses the following problems: 1. DCCP relies for its proper functioning on having at least one CCID module enabled (as in TCP plugable congestion control). Currently it is possible to disable both CCIDs and thus leave the DCCP module in a compiled, but entirely non-functional state: no sockets can be created when no CCID is available. Furthermore, the protocol is (again like TCP) not intended to be used without CCIDs. Last, a non-empty CCID list is needed for doing CCID feature negotiation. 2. Internally the default CCID that is advertised by the Linux host is set to CCID2 (DCCPF_INITIAL_CCID in include/linux/dccp.h). Disabling CCID2 in the Kconfig menu without changing the defaults leads to a failure `module not found' when trying to load the dccp module (which internally tries to load the default CCID). 3. The specification (RFC 4340, sec. 10) treats CCID2 somewhat like a `minimum common denominator'; the specification says that: * "New connections start with CCID 2 for both endpoints" * "A DCCP implementation intended for general use, such as an implementation in a general-purpose operating system kernel, SHOULD implement at least CCID 2. The intent is to make CCID 2 broadly available for interoperability [...]" Providing CCID2 as minimum-required CCID (like Reno/Cubic in TCP) thus seems reasonable. Hence this patch automatically selects CCID2 when DCCP is enabled. Documentation also added. Discussions with Ian McDonald on this subject are gratefully acknowledged. Signed-off-by: Gerrit Renker Signed-off-by: Ian McDonald Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller --- Documentation/networking/dccp.txt | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index fdc93beec05..ffb9ca937d6 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -14,8 +14,15 @@ Introduction ============ Datagram Congestion Control Protocol (DCCP) is an unreliable, connection -based protocol designed to solve issues present in UDP and TCP particularly -for real time and multimedia traffic. +oriented protocol designed to solve issues present in UDP and TCP, particularly +for real-time and multimedia (streaming) traffic. +It divides into a base protocol (RFC 4340) and plugable congestion control +modules called CCIDs. Like plugable TCP congestion control, at least one CCID +needs to be enabled in order for the protocol to function properly. In the Linux +implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as +the TCP-friendly CCID3 (RFC 4342), are optional. +For a brief introduction to CCIDs and suggestions for choosing a CCID to match +given applications, see section 10 of RFC 4340. It has a base protocol and pluggable congestion control IDs (CCIDs). -- cgit v1.2.3 From c28149016c24f4399c0a1eb0ebc15c92611223f0 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Wed, 21 Nov 2007 10:14:31 -0200 Subject: [DCCP]: Update documentation on ioctls Signed-off-by: Gerrit Renker Acked-by: Ian McDonald Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller --- Documentation/networking/dccp.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index ffb9ca937d6..d76905a5a08 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -136,6 +136,12 @@ sync_ratelimit = 125 ms sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit of this parameter is milliseconds; a value of 0 disables rate-limiting. +IOCTLS +====== +FIONREAD + Works as in udp(7): returns in the `int' argument pointer the size of + the next pending datagram in bytes, or 0 when no datagram is pending. + Notes ===== -- cgit v1.2.3 From 003faaa17793c478ed2babc56a618396e0ef96c2 Mon Sep 17 00:00:00 2001 From: "John W. Linville" Date: Sun, 27 Jan 2008 22:48:37 -0800 Subject: bcm43xx: mark as obsolete and schedule for removal Schedule bcm43xx for for removal in the 2.6.26 development window. Signed-off-by: John W. Linville Signed-off-by: David S. Miller --- Documentation/feature-removal-schedule.txt | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 25370662cc5..8d5f9f8b7a7 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -306,3 +306,12 @@ Why: These drivers are superseded by i810fb, intelfb and savagefb. Who: Jean Delvare --------------------------- + +What: bcm43xx wireless network driver +When: 2.6.26 +Files: drivers/net/wireless/bcm43xx +Why: This driver's functionality has been replaced by the + mac80211-based b43 and b43legacy drivers. +Who: John W. Linville + +--------------------------- -- cgit v1.2.3 From edae58ead57319b0bcf16e48a32365145c7bf0b7 Mon Sep 17 00:00:00 2001 From: "John W. Linville" Date: Wed, 21 Nov 2007 15:24:35 -0500 Subject: softmac: mark as obsolete and schedule for removal Schedule softmac for for removal in the 2.6.26 development window. Signed-off-by: John W. Linville Signed-off-by: David S. Miller --- Documentation/feature-removal-schedule.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 8d5f9f8b7a7..ec50a2ee7dd 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -315,3 +315,11 @@ Why: This driver's functionality has been replaced by the Who: John W. Linville --------------------------- + +What: iee80211 softmac wireless networking component +When: 2.6.26 (or after removal of bcm43xx and port of zd1211rw to mac80211) +Files: net/ieee80211/softmac +Why: No in-kernel drivers will depend on it any longer. +Who: John W. Linville + +--------------------------- -- cgit v1.2.3 From cb75994ec311b2cd50e5205efdcc0696abd6675d Mon Sep 17 00:00:00 2001 From: Wang Chen Date: Mon, 3 Dec 2007 22:33:28 +1100 Subject: [UDP]: Defer InDataGrams increment until recvmsg() does checksum Thanks dave, herbert, gerrit, andi and other people for your discussion about this problem. UdpInDatagrams can be confusing because it counts packets that might be dropped later. Move UdpInDatagrams into recvmsg() as allowed by the RFC. Signed-off-by: Wang Chen Signed-off-by: Herbert Xu Signed-off-by: David S. Miller --- Documentation/networking/udplite.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt index b6409cab075..3870f280280 100644 --- a/Documentation/networking/udplite.txt +++ b/Documentation/networking/udplite.txt @@ -236,7 +236,7 @@ This displays UDP-Lite statistics variables, whose meaning is as follows. - InDatagrams: Total number of received datagrams. + InDatagrams: The total number of datagrams delivered to users. NoPorts: Number of packets received to an unknown port. These cases are counted separately (not as InErrors). -- cgit v1.2.3 From cb76c6a597350534d211ba79d92da1f9771f8226 Mon Sep 17 00:00:00 2001 From: Patrick McHardy Date: Tue, 4 Dec 2007 23:39:36 -0800 Subject: [NETFILTER]: ip_tables: remove obsolete SAME target Remove the ipt_SAME target as scheduled in feature-removal-schedule. Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller --- Documentation/feature-removal-schedule.txt | 9 --------- 1 file changed, 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index ec50a2ee7dd..0927425026f 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -249,15 +249,6 @@ Who: Tejun Heo --------------------------- -What: iptables SAME target -When: 1.1. 2008 -Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h -Why: Obsolete for multiple years now, NAT core provides the same behaviour. - Unfixable broken wrt. 32/64 bit cleanness. -Who: Patrick McHardy - ---------------------------- - What: The arch/ppc and include/asm-ppc directories When: Jun 2008 Why: The arch/powerpc tree is the merged architecture for ppc32 and ppc64 -- cgit v1.2.3 From b8599d20708fa3bde1e414689f3474560c2d990b Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 13 Dec 2007 12:25:01 -0200 Subject: [DCCP]: Support for server holding timewait state This adds a socket option and signalling support for the case where the server holds timewait state on closing the connection, as described in RFC 4340, 8.3. Since holding timewait state at the server is the non-usual case, it is enabled via a socket option. Documentation for this socket option has been added. The setsockopt statement has been made resilient against different possible cases of expressing boolean `true' values using a suggestion by Ian McDonald. Signed-off-by: Gerrit Renker Signed-off-by: Ian McDonald Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller --- Documentation/networking/dccp.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index d76905a5a08..39131a3c78f 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -57,6 +57,12 @@ can be set before calling bind(). DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet size (application payload size) in bytes, see RFC 4340, section 14. +DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold +timewait state when closing the connection (RFC 4340, 8.3). The usual case is +that the closing server sends a CloseReq, whereupon the client holds timewait +state. When this boolean socket option is on, the server sends a Close instead +and will enter TIMEWAIT. This option must be set after accept() returns. + DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums always cover the entire packet and that only fully covered application data is -- cgit v1.2.3 From 558f82ef6e0d25e87f7468c07b6db1fbbf95a855 Mon Sep 17 00:00:00 2001 From: Masahide NAKAMURA Date: Thu, 20 Dec 2007 20:42:57 -0800 Subject: [XFRM]: Define packet dropping statistics. This statistics is shown factor dropped by transformation at /proc/net/xfrm_stat for developer. It is a counter designed from current transformation source code and defined as linux private MIB. See Documentation/networking/xfrm_proc.txt for the detail. Signed-off-by: Masahide NAKAMURA Signed-off-by: David S. Miller --- Documentation/networking/xfrm_proc.txt | 71 ++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 Documentation/networking/xfrm_proc.txt (limited to 'Documentation') diff --git a/Documentation/networking/xfrm_proc.txt b/Documentation/networking/xfrm_proc.txt new file mode 100644 index 00000000000..ec9045b5c34 --- /dev/null +++ b/Documentation/networking/xfrm_proc.txt @@ -0,0 +1,71 @@ +XFRM proc - /proc/net/xfrm_* files +================================== +Masahide NAKAMURA + + +Transformation Statistics +------------------------- +xfrm_proc is a statistics shown factor dropped by transformation +for developer. +It is a counter designed from current transformation source code +and defined like linux private MIB. + +Inbound statistics +~~~~~~~~~~~~~~~~~~ +XfrmInError: + All errors which is not matched others +XfrmInBufferError: + No buffer is left +XfrmInHdrError: + Header error +XfrmInNoStates: + No state is found + i.e. Either inbound SPI, address, or IPsec protocol at SA is wrong +XfrmInStateProtoError: + Transformation protocol specific error + e.g. SA key is wrong +XfrmInStateModeError: + Transformation mode specific error +XfrmInSeqOutOfWindow: + Sequence out of window +XfrmInStateExpired: + State is expired +XfrmInStateMismatch: + State has mismatch option + e.g. UDP encapsulation type is mismatch +XfrmInStateInvalid: + State is invalid +XfrmInTmplMismatch: + No matching template for states + e.g. Inbound SAs are correct but SP rule is wrong +XfrmInNoPols: + No policy is found for states + e.g. Inbound SAs are correct but no SP is found +XfrmInPolBlock: + Policy discards +XfrmInPolError: + Policy error + +Outbound errors +~~~~~~~~~~~~~~~ +XfrmOutError: + All errors which is not matched others +XfrmOutBundleGenError: + Bundle generation error +XfrmOutBundleCheckError: + Bundle check error +XfrmOutNoStates: + No state is found +XfrmOutStateProtoError: + Transformation protocol specific error +XfrmOutStateModeError: + Transformation mode specific error + e.g. Outer header space is not enough +XfrmOutStateExpired: + State is expired +XfrmOutPolBlock: + Policy discards +XfrmOutPolDead: + Policy is dead +XfrmOutPolError: + Policy error -- cgit v1.2.3 From c21b39aca4f8f4975784e54cd3a1b80bab80dcc0 Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Wed, 19 Dec 2007 01:26:16 +0100 Subject: mac80211: make PID rate control algorithm the default This makes the new PID TX rate control algorithm the default instead of the rc80211_simple rate control algorithm. The simple algorithm was flawed in several ways: it wasn't responsive at all and didn't age the information it was relying on properly. The PID algorithm allows us to tune characteristics such as responsiveness by adjusting parameters and was found to generally behave better. The default algorithm can be overridden to select simple instead. Which ever algorithm is the default is included as part of the mac80211 module automatically. The other algorithm (simple vs. pid) can be selected for inclusion as well. If EMBEDDED is selected then the choice is available to have no default specified and neither algorithm included in mac80211. The default algorithm can be set through a modparam. While at it, mark rc80211-simple as deprecated, and schedule it for removal. Signed-off-by: Stefano Brivio Signed-off-by: John W. Linville Signed-off-by: David S. Miller --- Documentation/feature-removal-schedule.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 0927425026f..a6cf8f6fc58 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -314,3 +314,11 @@ Why: No in-kernel drivers will depend on it any longer. Who: John W. Linville --------------------------- + +What: rc80211-simple rate control algorithm for mac80211 +When: 2.6.26 +Files: net/mac80211/rc80211-simple.c +Why: This algorithm was provided for reference but always exhibited bad + responsiveness and performance and has some serious flaws. It has been + replaced by rc80211-pid. +Who: Stefano Brivio -- cgit v1.2.3 From a1464ab61e66c96f9cffea335755de850fe8bdbd Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Wed, 19 Dec 2007 01:46:53 +0100 Subject: doc: fix typo in feature-removal-schedule Signed-off-by: Stefano Brivio Signed-off-by: John W. Linville Signed-off-by: David S. Miller --- Documentation/feature-removal-schedule.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index a6cf8f6fc58..2c3ae6c02a4 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -307,7 +307,7 @@ Who: John W. Linville --------------------------- -What: iee80211 softmac wireless networking component +What: ieee80211 softmac wireless networking component When: 2.6.26 (or after removal of bcm43xx and port of zd1211rw to mac80211) Files: net/ieee80211/softmac Why: No in-kernel drivers will depend on it any longer. -- cgit v1.2.3 From d9727bb2d516bc16bafee4216eec91cd2be5f30a Mon Sep 17 00:00:00 2001 From: Masahide NAKAMURA Date: Tue, 25 Dec 2007 20:56:26 -0800 Subject: [XFRM] Documentaion: Fix error example at XFRMOUTSTATEMODEERROR. Signed-off-by: Masahide NAKAMURA Signed-off-by: David S. Miller --- Documentation/networking/xfrm_proc.txt | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/xfrm_proc.txt b/Documentation/networking/xfrm_proc.txt index ec9045b5c34..53c1a58b02f 100644 --- a/Documentation/networking/xfrm_proc.txt +++ b/Documentation/networking/xfrm_proc.txt @@ -60,7 +60,6 @@ XfrmOutStateProtoError: Transformation protocol specific error XfrmOutStateModeError: Transformation mode specific error - e.g. Outer header space is not enough XfrmOutStateExpired: State is expired XfrmOutPolBlock: -- cgit v1.2.3 From 95766fff6b9a78d11fc2d3812dd035381690b55d Mon Sep 17 00:00:00 2001 From: Hideo Aoki Date: Mon, 31 Dec 2007 00:29:24 -0800 Subject: [UDP]: Add memory accounting. Signed-off-by: Takahiro Yasui Signed-off-by: Hideo Aoki Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 6f7872ba1de..17a6e46fbd4 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -446,6 +446,33 @@ tcp_dma_copybreak - INTEGER and CONFIG_NET_DMA is enabled. Default: 4096 +UDP variables: + +udp_mem - vector of 3 INTEGERs: min, pressure, max + Number of pages allowed for queueing by all UDP sockets. + + min: Below this number of pages UDP is not bothered about its + memory appetite. When amount of memory allocated by UDP exceeds + this number, UDP starts to moderate memory usage. + + pressure: This value was introduced to follow format of tcp_mem. + + max: Number of pages allowed for queueing by all UDP sockets. + + Default is calculated at boot time from amount of available memory. + +udp_rmem_min - INTEGER + Minimal size of receive buffer used by UDP sockets in moderation. + Each UDP socket is able to use the size for receiving data, even if + total pages of UDP sockets exceed udp_mem pressure. The unit is byte. + Default: 4096 + +udp_wmem_min - INTEGER + Minimal size of send buffer used by UDP sockets in moderation. + Each UDP socket is able to use the size for sending data, even if + total pages of UDP sockets exceed udp_mem pressure. The unit is byte. + Default: 4096 + CIPSOv4 Variables: cipso_cache_enable - BOOLEAN -- cgit v1.2.3 From f1862b0ae2294f6970f695abf02392d025e02dbe Mon Sep 17 00:00:00 2001 From: Adrian Bunk Date: Sun, 27 Jan 2008 23:04:43 -0800 Subject: [SHAPER]: The scheduled shaper removal. This patch contains the scheduled removal of the shaper driver. Signed-off-by: Adrian Bunk Acked-by: Alan Cox Signed-off-by: David S. Miller --- Documentation/feature-removal-schedule.txt | 9 ------ Documentation/networking/00-INDEX | 2 -- Documentation/networking/shaper.txt | 48 ------------------------------ 3 files changed, 59 deletions(-) delete mode 100644 Documentation/networking/shaper.txt (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 2c3ae6c02a4..cc02e67239e 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -280,15 +280,6 @@ Who: Thomas Gleixner --------------------------- -What: shaper network driver -When: January 2008 -Files: drivers/net/shaper.c, include/linux/if_shaper.h -Why: This driver has been marked obsolete for many years. - It was only designed to work on lower speed links and has design - flaws that lead to machine crashes. The qdisc infrastructure in - 2.4 or later kernels, provides richer features and is more robust. -Who: Stephen Hemminger - --------------------------- What: i2c-i810, i2c-prosavage and i2c-savage4 diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index b4eefadb9ad..02e56d447a8 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -84,8 +84,6 @@ policy-routing.txt - IP policy-based routing ray_cs.txt - Raylink Wireless LAN card driver info. -shaper.txt - - info on the module that can shape/limit transmitted traffic. sk98lin.txt - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit Ethernet Adapter family driver info diff --git a/Documentation/networking/shaper.txt b/Documentation/networking/shaper.txt deleted file mode 100644 index 6c4ebb66a90..00000000000 --- a/Documentation/networking/shaper.txt +++ /dev/null @@ -1,48 +0,0 @@ -Traffic Shaper For Linux - -This is the current BETA release of the traffic shaper for Linux. It works -within the following limits: - -o Minimum shaping speed is currently about 9600 baud (it can only -shape down to 1 byte per clock tick) - -o Maximum is about 256K, it will go above this but get a bit blocky. - -o If you ifconfig the master device that a shaper is attached to down -then your machine will follow. - -o The shaper must be a module. - - -Setup: - - A shaper device is configured using the shapeconfig program. -Typically you will do something like this - -shapecfg attach shaper0 eth1 -shapecfg speed shaper0 64000 -ifconfig shaper0 myhost netmask 255.255.255.240 broadcast 1.2.3.4.255 up -route add -net some.network netmask a.b.c.d dev shaper0 - -The shaper should have the same IP address as the device it is attached to -for normal use. - -Gotchas: - - The shaper shapes transmitted traffic. It's rather impossible to -shape received traffic except at the end (or a router) transmitting it. - - Gated/routed/rwhod/mrouted all see the shaper as an additional device -and will treat it as such unless patched. Note that for mrouted you can run -mrouted tunnels via a traffic shaper to control bandwidth usage. - - The shaper is device/route based. This makes it very easy to use -with any setup BUT less flexible. You may need to use iproute2 to set up -multiple route tables to get the flexibility. - - There is no "borrowing" or "sharing" scheme. This is a simple -traffic limiter. We implement Van Jacobson and Sally Floyd's CBQ -architecture into Linux 2.2. This is the preferred solution. Shaper is -for simple or back compatible setups. - -Alan -- cgit v1.2.3 From f9ef8a23c37677929e95ad4026a2fa0d449c1d3e Mon Sep 17 00:00:00 2001 From: Jan Engelhardt Date: Mon, 14 Jan 2008 23:43:34 -0800 Subject: [NETFILTER]: Update feature-removal-schedule.txt With all the newly introduced features, there is a lot to remove later on after a compatibility grace period of 2 years. Signed-off-by: Jan Engelhardt Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller --- Documentation/feature-removal-schedule.txt | 32 ++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index cc02e67239e..166915df8d6 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -313,3 +313,35 @@ Why: This algorithm was provided for reference but always exhibited bad responsiveness and performance and has some serious flaws. It has been replaced by rc80211-pid. Who: Stefano Brivio + +--------------------------- + +What (Why): + - include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files + (superseded by xt_TOS/xt_tos target & match) + + - "forwarding" header files like ipt_mac.h in + include/linux/netfilter_ipv4/ and include/linux/netfilter_ipv6/ + + - xt_CONNMARK match revision 0 + (superseded by xt_CONNMARK match revision 1) + + - xt_MARK target revisions 0 and 1 + (superseded by xt_MARK match revision 2) + + - xt_connmark match revision 0 + (superseded by xt_connmark match revision 1) + + - xt_conntrack match revision 0 + (superseded by xt_conntrack match revision 1) + + - xt_iprange match revision 0, + include/linux/netfilter_ipv4/ipt_iprange.h + (superseded by xt_iprange match revision 1) + + - xt_mark match revision 0 + (superseded by xt_mark match revision 1) + +When: January 2009 or Linux 2.7.0, whichever comes first +Why: Superseded by newer revisions or modules +Who: Jan Engelhardt -- cgit v1.2.3 From 9a6c686799346b6c95405c9e051f5023873504fa Mon Sep 17 00:00:00 2001 From: Jay Vosburgh Date: Tue, 13 Nov 2007 20:25:48 -0800 Subject: [BONDING]: Documentation update Update the bonding documentation: more discussion on initialization and configuration, changes to discussion of packet reordering in balance-rr, update some out of date information. Based in part on input from Rick Jones and Andy Gospodarek . Signed-off-by: Jay Vosburgh Acked-by: Andy Gospodarek Signed-off-by: David S. Miller --- Documentation/networking/bonding.txt | 204 ++++++++++++++++++++++++----------- 1 file changed, 143 insertions(+), 61 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 6cc30e0d579..a0cda062bc3 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -1,7 +1,7 @@ Linux Ethernet Bonding Driver HOWTO - Latest update: 24 April 2006 + Latest update: 12 November 2007 Initial release : Thomas Davis Corrections, HA extensions : 2000/10/03-15 : @@ -166,12 +166,17 @@ to use ifenslave. 2. Bonding Driver Options ========================= - Options for the bonding driver are supplied as parameters to -the bonding module at load time. They may be given as command line -arguments to the insmod or modprobe command, but are usually specified -in either the /etc/modules.conf or /etc/modprobe.conf configuration -file, or in a distro-specific configuration file (some of which are -detailed in the next section). + Options for the bonding driver are supplied as parameters to the +bonding module at load time, or are specified via sysfs. + + Module options may be given as command line arguments to the +insmod or modprobe command, but are usually specified in either the +/etc/modules.conf or /etc/modprobe.conf configuration file, or in a +distro-specific configuration file (some of which are detailed in the next +section). + + Details on bonding support for sysfs is provided in the +"Configuring Bonding Manually via Sysfs" section, below. The available bonding driver parameters are listed below. If a parameter is not specified the default value is used. When initially @@ -812,11 +817,13 @@ the system /etc/modules.conf or /etc/modprobe.conf configuration file. 3.2 Configuration with Initscripts Support ------------------------------------------ - This section applies to distros using a version of initscripts -with bonding support, for example, Red Hat Linux 9 or Red Hat -Enterprise Linux version 3 or 4. On these systems, the network -initialization scripts have some knowledge of bonding, and can be -configured to control bonding devices. + This section applies to distros using a recent version of +initscripts with bonding support, for example, Red Hat Enterprise Linux +version 3 or later, Fedora, etc. On these systems, the network +initialization scripts have knowledge of bonding, and can be configured to +control bonding devices. Note that older versions of the initscripts +package have lower levels of support for bonding; this will be noted where +applicable. These distros will not automatically load the network adapter driver unless the ethX device is configured with an IP address. @@ -864,11 +871,31 @@ USERCTL=no Be sure to change the networking specific lines (IPADDR, NETMASK, NETWORK and BROADCAST) to match your network configuration. - Finally, it is necessary to edit /etc/modules.conf (or -/etc/modprobe.conf, depending upon your distro) to load the bonding -module with your desired options when the bond0 interface is brought -up. The following lines in /etc/modules.conf (or modprobe.conf) will -load the bonding module, and select its options: + For later versions of initscripts, such as that found with Fedora +7 and Red Hat Enterprise Linux version 5 (or later), it is possible, and, +indeed, preferable, to specify the bonding options in the ifcfg-bond0 +file, e.g. a line of the format: + +BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=+192.168.1.254" + + will configure the bond with the specified options. The options +specified in BONDING_OPTS are identical to the bonding module parameters +except for the arp_ip_target field. Each target should be included as a +separate option and should be preceded by a '+' to indicate it should be +added to the list of queried targets, e.g., + + arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2 + + is the proper syntax to specify multiple targets. When specifying +options via BONDING_OPTS, it is not necessary to edit /etc/modules.conf or +/etc/modprobe.conf. + + For older versions of initscripts that do not support +BONDING_OPTS, it is necessary to edit /etc/modules.conf (or +/etc/modprobe.conf, depending upon your distro) to load the bonding module +with your desired options when the bond0 interface is brought up. The +following lines in /etc/modules.conf (or modprobe.conf) will load the +bonding module, and select its options: alias bond0 bonding options bond0 mode=balance-alb miimon=100 @@ -883,9 +910,10 @@ up and running. 3.2.1 Using DHCP with Initscripts --------------------------------- - Recent versions of initscripts (the version supplied with -Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do -have support for assigning IP information to bonding devices via DHCP. + Recent versions of initscripts (the versions supplied with Fedora +Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to +work) have support for assigning IP information to bonding devices via +DHCP. To configure bonding for DHCP, configure it as described above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" @@ -895,18 +923,14 @@ is case sensitive. 3.2.2 Configuring Multiple Bonds with Initscripts ------------------------------------------------- - At this writing, the initscripts package does not directly -support loading the bonding driver multiple times, so the process for -doing so is the same as described in the "Configuring Multiple Bonds -Manually" section, below. - - NOTE: It has been observed that some Red Hat supplied kernels -are apparently unable to rename modules at load time (the "-o bond1" -part). Attempts to pass that option to modprobe will produce an -"Operation not permitted" error. This has been reported on some -Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels -exhibiting this problem, it will be impossible to configure multiple -bonds with differing parameters. + Initscripts packages that are included with Fedora 7 and Red Hat +Enterprise Linux 5 support multiple bonding interfaces by simply +specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the +number of the bond. This support requires sysfs support in the kernel, +and a bonding driver of version 3.0.0 or later. Other configurations may +not support this method for specifying multiple bonding interfaces; for +those instances, see the "Configuring Multiple Bonds Manually" section, +below. 3.3 Configuring Bonding Manually with Ifenslave ----------------------------------------------- @@ -977,15 +1001,58 @@ initialization scripts lack support for configuring multiple bonds. options, you may wish to use the "max_bonds" module parameter, documented above. - To create multiple bonding devices with differing options, it -is necessary to use bonding parameters exported by sysfs, documented -in the section below. + To create multiple bonding devices with differing options, it is +preferrable to use bonding parameters exported by sysfs, documented in the +section below. + + For versions of bonding without sysfs support, the only means to +provide multiple instances of bonding with differing options is to load +the bonding driver multiple times. Note that current versions of the +sysconfig network initialization scripts handle this automatically; if +your distro uses these scripts, no special action is needed. See the +section Configuring Bonding Devices, above, if you're not sure about your +network initialization scripts. + + To load multiple instances of the module, it is necessary to +specify a different name for each instance (the module loading system +requires that every loaded module, even multiple instances of the same +module, have a unique name). This is accomplished by supplying multiple +sets of bonding options in /etc/modprobe.conf, for example: + +alias bond0 bonding +options bond0 -o bond0 mode=balance-rr miimon=100 + +alias bond1 bonding +options bond1 -o bond1 mode=balance-alb miimon=50 + + will load the bonding module two times. The first instance is +named "bond0" and creates the bond0 device in balance-rr mode with an +miimon of 100. The second instance is named "bond1" and creates the +bond1 device in balance-alb mode with an miimon of 50. + + In some circumstances (typically with older distributions), +the above does not work, and the second bonding instance never sees +its options. In that case, the second options line can be substituted +as follows: + +install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ + mode=balance-alb miimon=50 + This may be repeated any number of times, specifying a new and +unique name in place of bond1 for each subsequent instance. + + It has been observed that some Red Hat supplied kernels are unable +to rename modules at load time (the "-o bond1" part). Attempts to pass +that option to modprobe will produce an "Operation not permitted" error. +This has been reported on some Fedora Core kernels, and has been seen on +RHEL 4 as well. On kernels exhibiting this problem, it will be impossible +to configure multiple bonds with differing parameters (as they are older +kernels, and also lack sysfs support). 3.4 Configuring Bonding Manually via Sysfs ------------------------------------------ - Starting with version 3.0, Channel Bonding may be configured + Starting with version 3.0.0, Channel Bonding may be configured via the sysfs interface. This interface allows dynamic configuration of all bonds in the system without unloading the module. It also allows for adding and removing bonds at runtime. Ifenslave is no @@ -1030,9 +1097,6 @@ To enslave interface eth0 to bond bond0: To free slave eth0 from bond bond0: # echo -eth0 > /sys/class/net/bond0/bonding/slaves - NOTE: The bond must be up before slaves can be added. All -slaves are freed when the interface is brought down. - When an interface is enslaved to a bond, symlinks between the two are created in the sysfs filesystem. In this case, you would get /sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and @@ -1622,6 +1686,15 @@ one for each switch in the network). This will insure that, regardless of which switch is active, the ARP monitor has a suitable target to query. + Note, also, that of late many switches now support a functionality +generally referred to as "trunk failover." This is a feature of the +switch that causes the link state of a particular switch port to be set +down (or up) when the state of another switch port goes down (or up). +It's purpose is to propogate link failures from logically "exterior" ports +to the logically "interior" ports that bonding is able to monitor via +miimon. Availability and configuration for trunk failover varies by +switch, but this can be a viable alternative to the ARP monitor when using +suitable switches. 12. Configuring Bonding for Maximum Throughput ============================================== @@ -1709,7 +1782,7 @@ balance-rr: This mode is the only mode that will permit a single interfaces. It is therefore the only mode that will allow a single TCP/IP stream to utilize more than one interface's worth of throughput. This comes at a cost, however: the - striping often results in peer systems receiving packets out + striping generally results in peer systems receiving packets out of order, causing TCP/IP's congestion control system to kick in, often by retransmitting segments. @@ -1721,22 +1794,20 @@ balance-rr: This mode is the only mode that will permit a single interface's worth of throughput, even after adjusting tcp_reordering. - Note that this out of order delivery occurs when both the - sending and receiving systems are utilizing a multiple - interface bond. Consider a configuration in which a - balance-rr bond feeds into a single higher capacity network - channel (e.g., multiple 100Mb/sec ethernets feeding a single - gigabit ethernet via an etherchannel capable switch). In this - configuration, traffic sent from the multiple 100Mb devices to - a destination connected to the gigabit device will not see - packets out of order. However, traffic sent from the gigabit - device to the multiple 100Mb devices may or may not see - traffic out of order, depending upon the balance policy of the - switch. Many switches do not support any modes that stripe - traffic (instead choosing a port based upon IP or MAC level - addresses); for those devices, traffic flowing from the - gigabit device to the many 100Mb devices will only utilize one - interface. + Note that the fraction of packets that will be delivered out of + order is highly variable, and is unlikely to be zero. The level + of reordering depends upon a variety of factors, including the + networking interfaces, the switch, and the topology of the + configuration. Speaking in general terms, higher speed network + cards produce more reordering (due to factors such as packet + coalescing), and a "many to many" topology will reorder at a + higher rate than a "many slow to one fast" configuration. + + Many switches do not support any modes that stripe traffic + (instead choosing a port based upon IP or MAC level addresses); + for those devices, traffic for a particular connection flowing + through the switch to a balance-rr bond will not utilize greater + than one interface's worth of bandwidth. If you are utilizing protocols other than TCP/IP, UDP for example, and your application can tolerate out of order @@ -1936,6 +2007,10 @@ Failover may be delayed via the downdelay bonding module option. 13.2 Duplicated Incoming Packets -------------------------------- + NOTE: Starting with version 3.0.2, the bonding driver has logic to +suppress duplicate packets, which should largely eliminate this problem. +The following description is kept for reference. + It is not uncommon to observe a short burst of duplicated traffic when the bonding device is first used, or after it has been idle for some period of time. This is most easily observed by issuing @@ -2096,6 +2171,9 @@ The new driver was designed to be SMP safe from the start. EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, devices need not be of the same speed. + Starting with version 3.2.1, bonding also supports Infiniband +slaves in active-backup mode. + 3. How many bonding devices can I have? There is no limit. @@ -2154,11 +2232,15 @@ switches currently available support 802.3ad. 8. Where does a bonding device get its MAC address from? - If not explicitly configured (with ifconfig or ip link), the -MAC address of the bonding device is taken from its first slave -device. This MAC address is then passed to all following slaves and -remains persistent (even if the first slave is removed) until the -bonding device is brought down or reconfigured. + When using slave devices that have fixed MAC addresses, or when +the fail_over_mac option is enabled, the bonding device's MAC address is +the MAC address of the active slave. + + For other configurations, if not explicitly configured (with +ifconfig or ip link), the MAC address of the bonding device is taken from +its first slave device. This MAC address is then passed to all following +slaves and remains persistent (even if the first slave is removed) until +the bonding device is brought down or reconfigured. If you wish to change the MAC address, you can set it with ifconfig or ip link: -- cgit v1.2.3 From eb189d8bc9824bcb2187ffdab27d77ab469264c3 Mon Sep 17 00:00:00 2001 From: Michael Buesch Date: Mon, 28 Jan 2008 14:47:41 -0800 Subject: b43: Add support for new firmware This patch adds support for new firmware. Old firmware is still supported until July 2008. To get new firmware, go to ftp://ftp.linksys.com/opensourcecode/wrt150nv11/1.51.3/ and download the tarball. We don't have a smaller tarball, yet. That will be fixed later. You can extract firmware out of the "wl_ap.o" file contained in this tarball using latest fwcutter. You must pass the option --unsupported to fwcutter. Fwcutter-010 with official support for a new firmware image will be released soon. Signed-off-by: Michael Buesch Signed-off-by: John W. Linville Signed-off-by: David S. Miller --- Documentation/feature-removal-schedule.txt | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 166915df8d6..181bff00516 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -345,3 +345,12 @@ What (Why): When: January 2009 or Linux 2.7.0, whichever comes first Why: Superseded by newer revisions or modules Who: Jan Engelhardt + +--------------------------- + +What: b43 support for firmware revision < 410 +When: July 2008 +Why: The support code for the old firmware hurts code readability/maintainability + and slightly hurts runtime performance. Bugfixes for the old firmware + are not provided by Broadcom anymore. +Who: Michael Buesch -- cgit v1.2.3