aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/feature-removal-schedule.txt40
-rw-r--r--Documentation/filesystems/afs.txt214
-rw-r--r--Documentation/filesystems/proc.txt9
-rw-r--r--Documentation/infiniband/user_mad.txt8
-rw-r--r--Documentation/keys.txt12
-rw-r--r--Documentation/networking/bonding.txt35
-rw-r--r--Documentation/networking/dccp.txt10
-rw-r--r--Documentation/networking/ip-sysctl.txt31
-rw-r--r--Documentation/networking/rxrpc.txt859
-rw-r--r--Documentation/networking/wan-router.txt1
-rw-r--r--Documentation/s390/crypto/crypto-API.txt83
-rw-r--r--Documentation/s390/zfcpdump.txt87
12 files changed, 1188 insertions, 201 deletions
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 19b4c96b2a4..6da663607f7 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -211,15 +211,6 @@ Who: Adrian Bunk <bunk@stusta.de>
---------------------------
-What: IPv4 only connection tracking/NAT/helpers
-When: 2.6.22
-Why: The new layer 3 independant connection tracking replaces the old
- IPv4 only version. After some stabilization of the new code the
- old one will be removed.
-Who: Patrick McHardy <kaber@trash.net>
-
----------------------------
-
What: ACPI hooks (X86_SPEEDSTEP_CENTRINO_ACPI) in speedstep-centrino driver
When: December 2006
Why: Speedstep-centrino driver with ACPI hooks and acpi-cpufreq driver are
@@ -294,18 +285,6 @@ Who: Richard Purdie <rpurdie@rpsys.net>
---------------------------
-What: Wireless extensions over netlink (CONFIG_NET_WIRELESS_RTNETLINK)
-When: with the merge of wireless-dev, 2.6.22 or later
-Why: The option/code is
- * not enabled on most kernels
- * not required by any userspace tools (except an experimental one,
- and even there only for some parts, others use ioctl)
- * pointless since wext is no longer evolving and the ioctl
- interface needs to be kept
-Who: Johannes Berg <johannes@sipsolutions.net>
-
----------------------------
-
What: i8xx_tco watchdog driver
When: in 2.6.22
Why: the i8xx_tco watchdog driver has been replaced by the iTCO_wdt
@@ -313,3 +292,22 @@ Why: the i8xx_tco watchdog driver has been replaced by the iTCO_wdt
Who: Wim Van Sebroeck <wim@iguana.be>
---------------------------
+
+What: Multipath cached routing support in ipv4
+When: in 2.6.23
+Why: Code was merged, then submitter immediately disappeared leaving
+ us with no maintainer and lots of bugs. The code should not have
+ been merged in the first place, and many aspects of it's
+ implementation are blocking more critical core networking
+ development. It's marked EXPERIMENTAL and no distribution
+ enables it because it cause obscure crashes due to unfixable bugs
+ (interfaces don't return errors so memory allocation can't be
+ handled, calling contexts of these interfaces make handling
+ errors impossible too because they get called after we've
+ totally commited to creating a route object, for example).
+ This problem has existed for years and no forward progress
+ has ever been made, and nobody steps up to try and salvage
+ this code, so we're going to finally just get rid of it.
+Who: David S. Miller <davem@davemloft.net>
+
+---------------------------
diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt
index 2f4237dfb8c..12ad6c7f4e5 100644
--- a/Documentation/filesystems/afs.txt
+++ b/Documentation/filesystems/afs.txt
@@ -1,31 +1,82 @@
+ ====================
kAFS: AFS FILESYSTEM
====================
-ABOUT
-=====
+Contents:
+
+ - Overview.
+ - Usage.
+ - Mountpoints.
+ - Proc filesystem.
+ - The cell database.
+ - Security.
+ - Examples.
+
+
+========
+OVERVIEW
+========
-This filesystem provides a fairly simple AFS filesystem driver. It is under
-development and only provides very basic facilities. It does not yet support
-the following AFS features:
+This filesystem provides a fairly simple secure AFS filesystem driver. It is
+under development and does not yet provide the full feature set. The features
+it does support include:
- (*) Write support.
- (*) Communications security.
- (*) Local caching.
- (*) pioctl() system call.
- (*) Automatic mounting of embedded mountpoints.
+ (*) Security (currently only AFS kaserver and KerberosIV tickets).
+ (*) File reading.
+ (*) Automounting.
+
+It does not yet support the following AFS features:
+
+ (*) Write support.
+
+ (*) Local caching.
+
+ (*) pioctl() system call.
+
+
+===========
+COMPILATION
+===========
+
+The filesystem should be enabled by turning on the kernel configuration
+options:
+
+ CONFIG_AF_RXRPC - The RxRPC protocol transport
+ CONFIG_RXKAD - The RxRPC Kerberos security handler
+ CONFIG_AFS - The AFS filesystem
+
+Additionally, the following can be turned on to aid debugging:
+
+ CONFIG_AF_RXRPC_DEBUG - Permit AF_RXRPC debugging to be enabled
+ CONFIG_AFS_DEBUG - Permit AFS debugging to be enabled
+
+They permit the debugging messages to be turned on dynamically by manipulating
+the masks in the following files:
+
+ /sys/module/af_rxrpc/parameters/debug
+ /sys/module/afs/parameters/debug
+
+
+=====
USAGE
=====
When inserting the driver modules the root cell must be specified along with a
list of volume location server IP addresses:
- insmod rxrpc.o
+ insmod af_rxrpc.o
+ insmod rxkad.o
insmod kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
-The first module is a driver for the RxRPC remote operation protocol, and the
-second is the actual filesystem driver for the AFS filesystem.
+The first module is the AF_RXRPC network protocol driver. This provides the
+RxRPC remote operation protocol and may also be accessed from userspace. See:
+
+ Documentation/networking/rxrpc.txt
+
+The second module is the kerberos RxRPC security driver, and the third module
+is the actual filesystem driver for the AFS filesystem.
Once the module has been loaded, more modules can be added by the following
procedure:
@@ -33,7 +84,7 @@ procedure:
echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
Where the parameters to the "add" command are the name of a cell and a list of
-volume location servers within that cell.
+volume location servers within that cell, with the latter separated by colons.
Filesystems can be mounted anywhere by commands similar to the following:
@@ -42,11 +93,6 @@ Filesystems can be mounted anywhere by commands similar to the following:
mount -t afs "#root.afs." /afs
mount -t afs "#root.cell." /afs/cambridge
- NB: When using this on Linux 2.4, the mount command has to be different,
- since the filesystem doesn't have access to the device name argument:
-
- mount -t afs none /afs -ovol="#root.afs."
-
Where the initial character is either a hash or a percent symbol depending on
whether you definitely want a R/W volume (hash) or whether you'd prefer a R/O
volume, but are willing to use a R/W volume instead (percent).
@@ -60,55 +106,66 @@ named volume will be looked up in the cell specified during insmod.
Additional cells can be added through /proc (see later section).
+===========
MOUNTPOINTS
===========
-AFS has a concept of mountpoints. These are specially formatted symbolic links
-(of the same form as the "device name" passed to mount). kAFS presents these
-to the user as directories that have special properties:
+AFS has a concept of mountpoints. In AFS terms, these are specially formatted
+symbolic links (of the same form as the "device name" passed to mount). kAFS
+presents these to the user as directories that have a follow-link capability
+(ie: symbolic link semantics). If anyone attempts to access them, they will
+automatically cause the target volume to be mounted (if possible) on that site.
- (*) They cannot be listed. Running a program like "ls" on them will incur an
- EREMOTE error (Object is remote).
+Automatically mounted filesystems will be automatically unmounted approximately
+twenty minutes after they were last used. Alternatively they can be unmounted
+directly with the umount() system call.
- (*) Other objects can't be looked up inside of them. This also incurs an
- EREMOTE error.
+Manually unmounting an AFS volume will cause any idle submounts upon it to be
+culled first. If all are culled, then the requested volume will also be
+unmounted, otherwise error EBUSY will be returned.
- (*) They can be queried with the readlink() system call, which will return
- the name of the mountpoint to which they point. The "readlink" program
- will also work.
+This can be used by the administrator to attempt to unmount the whole AFS tree
+mounted on /afs in one go by doing:
- (*) They can be mounted on (which symbolic links can't).
+ umount /afs
+===============
PROC FILESYSTEM
===============
-The rxrpc module creates a number of files in various places in the /proc
-filesystem:
-
- (*) Firstly, some information files are made available in a directory called
- "/proc/net/rxrpc/". These list the extant transport endpoint, peer,
- connection and call records.
-
- (*) Secondly, some control files are made available in a directory called
- "/proc/sys/rxrpc/". Currently, all these files can be used for is to
- turn on various levels of tracing.
-
The AFS modules creates a "/proc/fs/afs/" directory and populates it:
- (*) A "cells" file that lists cells currently known to the afs module.
+ (*) A "cells" file that lists cells currently known to the afs module and
+ their usage counts:
+
+ [root@andromeda ~]# cat /proc/fs/afs/cells
+ USE NAME
+ 3 cambridge.redhat.com
(*) A directory per cell that contains files that list volume location
servers, volumes, and active servers known within that cell.
+ [root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/servers
+ USE ADDR STATE
+ 4 172.16.18.91 0
+ [root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/vlservers
+ ADDRESS
+ 172.16.18.91
+ [root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/volumes
+ USE STT VLID[0] VLID[1] VLID[2] NAME
+ 1 Val 20000000 20000001 20000002 root.afs
+
+=================
THE CELL DATABASE
=================
-The filesystem maintains an internal database of all the cells it knows and
-the IP addresses of the volume location servers for those cells. The cell to
-which the computer belongs is added to the database when insmod is performed
-by the "rootcell=" argument.
+The filesystem maintains an internal database of all the cells it knows and the
+IP addresses of the volume location servers for those cells. The cell to which
+the system belongs is added to the database when insmod is performed by the
+"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
+the kernel command line.
Further cells can be added by commands similar to the following:
@@ -118,20 +175,65 @@ Further cells can be added by commands similar to the following:
No other cell database operations are available at this time.
+========
+SECURITY
+========
+
+Secure operations are initiated by acquiring a key using the klog program. A
+very primitive klog program is available at:
+
+ http://people.redhat.com/~dhowells/rxrpc/klog.c
+
+This should be compiled by:
+
+ make klog LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils"
+
+And then run as:
+
+ ./klog
+
+Assuming it's successful, this adds a key of type RxRPC, named for the service
+and cell, eg: "afs@<cellname>". This can be viewed with the keyctl program or
+by cat'ing /proc/keys:
+
+ [root@andromeda ~]# keyctl show
+ Session Keyring
+ -3 --alswrv 0 0 keyring: _ses.3268
+ 2 --alswrv 0 0 \_ keyring: _uid.0
+ 111416553 --als--v 0 0 \_ rxrpc: afs@CAMBRIDGE.REDHAT.COM
+
+Currently the username, realm, password and proposed ticket lifetime are
+compiled in to the program.
+
+It is not required to acquire a key before using AFS facilities, but if one is
+not acquired then all operations will be governed by the anonymous user parts
+of the ACLs.
+
+If a key is acquired, then all AFS operations, including mounts and automounts,
+made by a possessor of that key will be secured with that key.
+
+If a file is opened with a particular key and then the file descriptor is
+passed to a process that doesn't have that key (perhaps over an AF_UNIX
+socket), then the operations on the file will be made with key that was used to
+open the file.
+
+
+========
EXAMPLES
========
-Here's what I use to test this. Some of the names and IP addresses are local
-to my internal DNS. My "root.afs" partition has a mount point within it for
+Here's what I use to test this. Some of the names and IP addresses are local
+to my internal DNS. My "root.afs" partition has a mount point within it for
some public volumes volumes.
-insmod -S /tmp/rxrpc.o
-insmod -S /tmp/kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
+insmod /tmp/rxrpc.o
+insmod /tmp/rxkad.o
+insmod /tmp/kafs.o rootcell=cambridge.redhat.com:172.16.18.91
mount -t afs \%root.afs. /afs
mount -t afs \%cambridge.redhat.com:root.cell. /afs/cambridge.redhat.com/
-echo add grand.central.org 18.7.14.88:128.2.191.224 > /proc/fs/afs/cells
+echo add grand.central.org 18.7.14.88:128.2.191.224 > /proc/fs/afs/cells
mount -t afs "#grand.central.org:root.cell." /afs/grand.central.org/
mount -t afs "#grand.central.org:root.archive." /afs/grand.central.org/archive
mount -t afs "#grand.central.org:root.contrib." /afs/grand.central.org/contrib
@@ -141,15 +243,7 @@ mount -t afs "#grand.central.org:root.service." /afs/grand.central.org/service
mount -t afs "#grand.central.org:root.software." /afs/grand.central.org/software
mount -t afs "#grand.central.org:root.user." /afs/grand.central.org/user
-umount /afs/grand.central.org/user
-umount /afs/grand.central.org/software
-umount /afs/grand.central.org/service
-umount /afs/grand.central.org/project
-umount /afs/grand.central.org/doc
-umount /afs/grand.central.org/contrib
-umount /afs/grand.central.org/archive
-umount /afs/grand.central.org
-umount /afs/cambridge.redhat.com
umount /afs
rmmod kafs
+rmmod rxkad
rmmod rxrpc
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 5484ab5efd4..7aaf09b86a5 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1421,6 +1421,15 @@ fewer messages that will be written. Message_burst controls when messages will
be dropped. The default settings limit warning messages to one every five
seconds.
+warnings
+--------
+
+This controls console messages from the networking stack that can occur because
+of problems on the network like duplicate address or bad checksums. Normally,
+this should be enabled, but if the problem persists the messages can be
+disabled.
+
+
netdev_max_backlog
------------------
diff --git a/Documentation/infiniband/user_mad.txt b/Documentation/infiniband/user_mad.txt
index 750fe5e80eb..8ec54b974b6 100644
--- a/Documentation/infiniband/user_mad.txt
+++ b/Documentation/infiniband/user_mad.txt
@@ -91,6 +91,14 @@ Sending MADs
if (ret != sizeof *mad + mad_length)
perror("write");
+Transaction IDs
+
+ Users of the umad devices can use the lower 32 bits of the
+ transaction ID field (that is, the least significant half of the
+ field in network byte order) in MADs being sent to match
+ request/response pairs. The upper 32 bits are reserved for use by
+ the kernel and will be overwritten before a MAD is sent.
+
Setting IsSM Capability Bit
To set the IsSM capability bit for a port, simply open the
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index 60c665d9cfa..81d9aa09729 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -859,6 +859,18 @@ payload contents" for more information.
void unregister_key_type(struct key_type *type);
+Under some circumstances, it may be desirable to desirable to deal with a
+bundle of keys. The facility provides access to the keyring type for managing
+such a bundle:
+
+ struct key_type key_type_keyring;
+
+This can be used with a function such as request_key() to find a specific
+keyring in a process's keyrings. A keyring thus found can then be searched
+with keyring_search(). Note that it is not possible to use request_key() to
+search a specific keyring, so using keyrings in this way is of limited utility.
+
+
===================================
NOTES ON ACCESSING PAYLOAD CONTENTS
===================================
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index de809e58092..1da56663083 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -920,40 +920,9 @@ options, you may wish to use the "max_bonds" module parameter,
documented above.
To create multiple bonding devices with differing options, it
-is necessary to load the bonding driver multiple times. Note that
-current versions of the sysconfig network initialization scripts
-handle this automatically; if your distro uses these scripts, no
-special action is needed. See the section Configuring Bonding
-Devices, above, if you're not sure about your network initialization
-scripts.
-
- To load multiple instances of the module, it is necessary to
-specify a different name for each instance (the module loading system
-requires that every loaded module, even multiple instances of the same
-module, have a unique name). This is accomplished by supplying
-multiple sets of bonding options in /etc/modprobe.conf, for example:
-
-alias bond0 bonding
-options bond0 -o bond0 mode=balance-rr miimon=100
-
-alias bond1 bonding
-options bond1 -o bond1 mode=balance-alb miimon=50
-
- will load the bonding module two times. The first instance is
-named "bond0" and creates the bond0 device in balance-rr mode with an
-miimon of 100. The second instance is named "bond1" and creates the
-bond1 device in balance-alb mode with an miimon of 50.
-
- In some circumstances (typically with older distributions),
-the above does not work, and the second bonding instance never sees
-its options. In that case, the second options line can be substituted
-as follows:
-
-install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \
- mode=balance-alb miimon=50
+is necessary to use bonding parameters exported by sysfs, documented
+in the section below.
- This may be repeated any number of times, specifying a new and
-unique name in place of bond1 for each subsequent instance.
3.4 Configuring Bonding Manually via Sysfs
------------------------------------------
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt
index 387482e46c4..4504cc59e40 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -57,6 +57,16 @@ DCCP_SOCKOPT_SEND_CSCOV is for the receiver and has a different meaning: it
coverage value are also acceptable. The higher the number, the more
restrictive this setting (see [RFC 4340, sec. 9.2.1]).
+The following two options apply to CCID 3 exclusively and are getsockopt()-only.
+In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned.
+DCCP_SOCKOPT_CCID_RX_INFO
+ Returns a `struct tfrc_rx_info' in optval; the buffer for optval and
+ optlen must be set to at least sizeof(struct tfrc_rx_info).
+DCCP_SOCKOPT_CCID_TX_INFO
+ Returns a `struct tfrc_tx_info' in optval; the buffer for optval and
+ optlen must be set to at least sizeof(struct tfrc_tx_info).
+
+
Sysctl variables
================
Several DCCP default parameters can be managed by the following sysctls
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 702d1d8dd04..af6a63ab902 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -179,11 +179,31 @@ tcp_fin_timeout - INTEGER
because they eat maximum 1.5K of memory, but they tend
to live longer. Cf. tcp_max_orphans.
-tcp_frto - BOOLEAN
+tcp_frto - INTEGER
Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
timeouts. It is particularly beneficial in wireless environments
where packet loss is typically due to random radio interference
- rather than intermediate router congestion.
+ rather than intermediate router congestion. If set to 1, basic
+ version is enabled. 2 enables SACK enhanced F-RTO, which is
+ EXPERIMENTAL. The basic version can be used also when SACK is
+ enabled for a flow through tcp_sack sysctl.
+
+tcp_frto_response - INTEGER
+ When F-RTO has detected that a TCP retransmission timeout was
+ spurious (i.e, the timeout would have been avoided had TCP set a
+ longer retransmission timeout), TCP has several options what to do
+ next. Possible values are:
+ 0 Rate halving based; a smooth and conservative response,
+ results in halved cwnd and ssthresh after one RTT
+ 1 Very conservative response; not recommended because even
+ though being valid, it interacts poorly with the rest of
+ Linux TCP, halves cwnd and ssthresh immediately
+ 2 Aggressive response; undoes congestion control measures
+ that are now known to be unnecessary (ignoring the
+ possibility of a lost retransmission that would require
+ TCP to be more cautious), cwnd and ssthresh are restored
+ to the values prior timeout
+ Default: 0 (rate halving based)
tcp_keepalive_time - INTEGER
How often TCP sends out keepalive messages when keepalive is enabled.
@@ -995,7 +1015,12 @@ bridge-nf-call-ip6tables - BOOLEAN
Default: 1
bridge-nf-filter-vlan-tagged - BOOLEAN
- 1 : pass bridged vlan-tagged ARP/IP traffic to arptables/iptables.
+ 1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables.
+ 0 : disable this.
+ Default: 1
+
+bridge-nf-filter-pppoe-tagged - BOOLEAN
+ 1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables.
0 : disable this.
Default: 1
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt
new file mode 100644
index 00000000000..cae231b1c13
--- /dev/null
+++ b/Documentation/networking/rxrpc.txt
@@ -0,0 +1,859 @@
+ ======================
+ RxRPC NETWORK PROTOCOL
+ ======================
+
+The RxRPC protocol driver provides a reliable two-phase transport on top of UDP
+that can be used to perform RxRPC remote operations. This is done over sockets
+of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and
+receive data, aborts and errors.
+
+Contents of this document:
+
+ (*) Overview.
+
+ (*) RxRPC protocol summary.
+
+ (*) AF_RXRPC driver model.
+
+ (*) Control messages.
+
+ (*) Socket options.
+
+ (*) Security.
+
+ (*) Example client usage.
+
+ (*) Example server usage.
+
+ (*) AF_RXRPC kernel interface.
+
+
+========
+OVERVIEW
+========
+
+RxRPC is a two-layer protocol. There is a session layer which provides
+reliable virtual connections using UDP over IPv4 (or IPv6) as the transport
+layer, but implements a real network protocol; and there's the presentation
+layer which renders structured data to binary blobs and back again using XDR
+(as does SunRPC):
+
+ +-------------+
+ | Application |
+ +-------------+
+ | XDR | Presentation
+ +-------------+
+ | RxRPC | Session
+ +-------------+
+ | UDP | Transport
+ +-------------+
+
+
+AF_RXRPC provides:
+
+ (1) Part of an RxRPC facility for both kernel and userspace applications by
+ making the session part of it a Linux network protocol (AF_RXRPC).
+
+ (2) A two-phase protocol. The client transmits a blob (the request) and then
+ receives a blob (the reply), and the server receives the request and then
+ transmits the reply.
+
+ (3) Retention of the reusable bits of the transport system set up for one call
+ to speed up subsequent calls.
+
+ (4) A secure protocol, using the Linux kernel's key retention facility to
+ manage security on the client end. The server end must of necessity be
+ more active in security negotiations.
+
+AF_RXRPC does not provide XDR marshalling/presentation facilities. That is
+left to the application. AF_RXRPC only deals in blobs. Even the operation ID
+is just the first four bytes of the request blob, and as such is beyond the
+kernel's interest.
+
+
+Sockets of AF_RXRPC family are:
+
+ (1) created as type SOCK_DGRAM;
+
+ (2) provided with a protocol of the type of underlying transport they're going
+ to use - currently only PF_INET is supported.
+
+
+The Andrew File System (AFS) is an example of an application that uses this and
+that has both kernel (filesystem) and userspace (utility) components.
+
+
+======================
+RXRPC PROTOCOL SUMMARY
+======================
+
+An overview of the RxRPC protocol:
+
+ (*) RxRPC sits on top of another networking protocol (UDP is the only option
+ currently), and uses this to provide network transport. UDP ports, for
+ example, provide transport endpoints.
+
+ (*) RxRPC supports multiple virtual "connections" from any given transport
+ endpoint, thus allowing the endpoints to be shared, even to the same
+ remote endpoint.
+
+ (*) Each connection goes to a particular "service". A connection may not go
+ to multiple services. A service may be considered the RxRPC equivalent of
+ a port number. AF_RXRPC permits multiple services to share an endpoint.
+
+ (*) Client-originating packets are marked, thus a transport endpoint can be
+ shared between client and server connections (connections have a
+ direction).
+
+ (*) Up to a billion connections may be supported concurrently between one
+ local transport endpoint and one service on one remote endpoint. An RxRPC
+ connection is described by seven numbers:
+
+ Local address }
+ Local port } Transport (UDP) address
+ Remote address }
+ Remote port }
+ Direction
+ Connection ID
+ Service ID
+
+ (*) Each RxRPC operation is a "call". A connection may make up to four
+ billion calls, but only up to four calls may be in progress on a
+ connection at any one time.
+
+ (*) Calls are two-phase and asymmetric: the client sends its request data,
+ which the service receives; then the service transmits the reply data
+ which the client receives.
+
+ (*) The data blobs are of indefinite size, the end of a phase is marked with a
+ flag in the packet. The number of packets of data making up one blob may
+ not exceed 4 billion, however, as this would cause the sequence number to
+ wrap.
+
+ (*) The first four bytes of the request data are the service operation ID.
+
+ (*) Security is negotiated on a per-connection basis. The connection is
+ initiated by the first data packet on it arriving. If security is
+ requested, the server then issues a "challenge" and then the client
+ replies with a "response". If the response is successful, the security is
+ set for the lifetime of that connection, and all subsequent calls made
+ upon it use that same security. In the event that the server lets a
+ connection lapse before the client, the security will be renegotiated if
+ the client uses the connection again.
+
+ (*) Calls use ACK packets to handle reliability. Data packets are also
+ explicitly sequenced per call.
+
+ (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs.
+ A hard-ACK indicates to the far side that all the data received to a point
+ has been received and processed; a soft-ACK indicates that the data has
+ been received but may yet be discarded and re-requested. The sender may
+ not discard any transmittable packets until they've been hard-ACK'd.
+
+ (*) Reception of a reply data packet implicitly hard-ACK's all the data
+ packets that make up the request.
+
+ (*) An call is complete when the request has been sent, the reply has been
+ received and the final hard-ACK on the last packet of the reply has
+ reached the server.
+
+ (*) An call may be aborted by either end at any time up to its completion.
+
+
+=====================
+AF_RXRPC DRIVER MODEL
+=====================
+
+About the AF_RXRPC driver:
+
+ (*) The AF_RXRPC protocol transparently uses internal sockets of the transport
+ protocol to represent transport endpoints.
+
+ (*) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC
+ connections are handled transparently. One client socket may be used to
+ make multiple simultaneous calls to the same service. One server socket
+ may handle calls from many clients.
+
+ (*) Additional parallel client connections will be initiated to support extra
+ concurrent calls, up to a tunable limit.
+
+ (*) Each connection is retained for a certain amount of time [tunable] after
+ the last call currently using it has completed in case a new call is made
+ that could reuse it.
+
+ (*) Each internal UDP socket is retained [tunable] for a certain amount of
+ time [tunable] after the last connection using it discarded, in case a new
+ connection is made that could use it.
+
+ (*) A client-side connection is only shared between calls if they have have
+ the same key struct describing their security (and assuming the calls
+ would otherwise share the connection). Non-secured calls would also be
+ able to share connections with each other.
+
+ (*) A server-side connection is shared if the client says it is.
+
+ (*) ACK'ing is handled by the protocol driver automatically, including ping
+ replying.
+
+ (*) SO_KEEPALIVE automatically pings the other side to keep the connection
+ alive [TODO].
+
+ (*) If an ICMP error is received, all calls affected by that error will be
+ aborted with an appropriate network error passed through recvmsg().
+
+
+Interaction with the user of the RxRPC socket:
+
+ (*) A socket is made into a server socket by binding an address with a
+ non-zero service ID.
+
+ (*) In the client, sending a request is achieved with one or more sendmsgs,
+ followed by the reply being received with one or more recvmsgs.
+
+ (*) The first sendmsg for a request to be sent from a client contains a tag to
+ be used in all other sendmsgs or recvmsgs associated with that call. The
+ tag is carried in the control data.
+
+ (*) connect() is used to supply a default destination address for a client
+ socket. This may be overridden by supplying an alternate address to the
+ first sendmsg() of a call (struct msghdr::msg_name).
+
+ (*) If connect() is called on an unbound client, a random local port will
+ bound before the operation takes place.
+
+ (*) A server socket may also be used to make client calls. To do this, the
+ first sendmsg() of the call must specify the target address. The server's
+ transport endpoint is used to send the packets.
+
+ (*) Once the application has received the last message associated with a call,
+ the tag is guaranteed not to be seen again, and so it can be used to pin
+ client resources. A new call can then be initiated with the same tag
+ without fear of interference.
+
+ (*) In the server, a request is received with one or more recvmsgs, then the
+ the reply is transmitted with one or more sendmsgs, and then the final ACK
+ is received with a last recvmsg.
+
+ (*) When sending data for a call, sendmsg is given MSG_MORE if there's more
+ data to come on that call.
+
+ (*) When receiving data for a call, recvmsg flags MSG_MORE if there's more
+ data to come for that call.
+
+ (*) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg
+ to indicate the terminal message for that call.
+
+ (*) A call may be aborted by adding an abort control message to the control
+ data. Issuing an abort terminates the kernel's use of that call's tag.
+ Any messages waiting in the receive queue for that call will be discarded.
+
+ (*) Aborts, busy notifications and challenge packets are delivered by recvmsg,
+ and control data messages will be set to indicate the context. Receiving
+ an abort or a busy message terminates the kernel's use of that call's tag.
+
+ (*) The control data part of the msghdr struct is used for a number of things:
+
+ (*) The tag of the intended or affected call.
+
+ (*) Sending or receiving errors, aborts and busy notifications.
+
+ (*) Notifications of incoming calls.
+
+ (*) Sending debug requests and receiving debug replies [TODO].
+
+ (*) When the kernel has received and set up an incoming call, it sends a
+ message to server application to let it know there's a new call awaiting
+ its acceptance [recvmsg reports a special control message]. The server
+ application then uses sendmsg to assign a tag to the new call. Once that
+ is done, the first part of the request data will be delivered by recvmsg.
+
+ (*) The server application has to provide the server socket with a keyring of
+ secret keys corresponding to the security types it permits. When a secure
+ connection is being set up, the kernel looks up the appropriate secret key
+ in the keyring and then sends a challenge packet to the client and
+ receives a response packet. The kernel then checks the authorisation of
+ the packet and either aborts the connection or sets up the security.
+
+ (*) The name of the key a client will use to secure its communications is
+ nominated by a socket option.
+
+
+Notes on recvmsg:
+
+ (*) If there's a sequence of data messages belonging to a particular call on
+ the receive queue, then recvmsg will keep working through them until:
+
+ (a) it meets the end of that call's received data,
+
+ (b) it meets a non-data message,
+
+ (c) it meets a message belonging to a different call, or
+
+ (d) it fills the user buffer.
+
+ If recvmsg is called in blocking mode, it will keep sleeping, awaiting the
+ reception of further data, until one of the above four conditions is met.
+
+ (2) MSG_PEEK operates similarly, but will return immediately if it has put any
+ data in the buffer rather than sleeping until it can fill the buffer.
+
+ (3) If a data message is only partially consumed in filling a user buffer,
+ then the remainder of that message will be left on the front of the queue
+ for the next taker. MSG_TRUNC will never be flagged.
+
+ (4) If there is more data to be had on a call (it hasn't copied the last byte
+ of the last data message in that phase yet), then MSG_MORE will be
+ flagged.
+
+
+================
+CONTROL MESSAGES
+================
+
+AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex
+calls, to invoke certain actions and to report certain conditions. These are:
+
+ MESSAGE ID SRT DATA MEANING
+ ======================= === =========== ===============================
+ RXRPC_USER_CALL_ID sr- User ID App's call specifier
+ RXRPC_ABORT srt Abort code Abort code to issue/received
+ RXRPC_ACK -rt n/a Final ACK received
+ RXRPC_NET_ERROR -rt error num Network error on call
+ RXRPC_BUSY -rt n/a Call rejected (server busy)
+ RXRPC_LOCAL_ERROR -rt error num Local error encountered
+ RXRPC_NEW_CALL -r- n/a New call received
+ RXRPC_ACCEPT s-- n/a Accept new call
+
+ (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message)
+
+ (*) RXRPC_USER_CALL_ID
+
+ This is used to indicate the application's call ID. It's an unsigned long
+ that the app specifies in the client by attaching it to the first data
+ message or in the server by passing it in association with an RXRPC_ACCEPT
+ message. recvmsg() passes it in conjunction with all messages except
+ those of the RXRPC_NEW_CALL message.
+
+ (*) RXRPC_ABORT
+
+ This is can be used by an application to abort a call by passing it to
+ sendmsg, or it can be delivered by recvmsg to indicate a remote abort was
+ received. Either way, it must be associated with an RXRPC_USER_CALL_ID to
+ specify the call affected. If an abort is being sent, then error EBADSLT
+ will be returned if there is no call with that user ID.
+
+ (*) RXRPC_ACK
+
+ This is delivered to a server application to indicate that the final ACK
+ of a call was received from the client. It will be associated with an
+ RXRPC_USER_CALL_ID to indicate the call that's now complete.
+
+ (*) RXRPC_NET_ERROR
+
+ This is delivered to an application to indicate that an ICMP error message
+ was encountered in the process of trying to talk to the peer. An
+ errno-class integer value will be included in the control message data
+ indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
+ affected.
+
+ (*) RXRPC_BUSY
+
+ This is delivered to a client application to indicate that a call was
+ rejected by the server due to the server being busy. It will be
+ associated with an RXRPC_USER_CALL_ID to indicate the rejected call.
+
+ (*) RXRPC_LOCAL_ERROR
+
+ This is delivered to an application to indicate that a local error was
+ encountered and that a call has been aborted because of it. An
+ errno-class integer value will be included in the control message data
+ indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
+ affected.
+
+ (*) RXRPC_NEW_CALL
+
+ This is delivered to indicate to a server application that a new call has
+ arrived and is awaiting acceptance. No user ID is associated with this,
+ as a user ID must subsequently be assigned by doing an RXRPC_ACCEPT.
+
+ (*) RXRPC_ACCEPT
+
+ This is used by a server application to attempt to accept a call and
+ assign it a user ID. It should be associated with an RXRPC_USER_CALL_ID
+ to indicate the user ID to be assigned. If there is no call to be
+ accepted (it may have timed out, been aborted, etc.), then sendmsg will
+ return error ENODATA. If the user ID is already in use by another call,
+ then error EBADSLT will be returned.
+
+
+==============
+SOCKET OPTIONS
+==============
+
+AF_RXRPC sockets support a few socket options at the SOL_RXRPC level:
+
+ (*) RXRPC_SECURITY_KEY
+
+ This is used to specify the description of the key to be used. The key is
+ extracted from the calling process's keyrings with request_key() and
+ should be of "rxrpc" type.
+
+ The optval pointer points to the description string, and optlen indicates
+ how long the string is, without the NUL terminator.
+
+ (*) RXRPC_SECURITY_KEYRING
+
+ Similar to above but specifies a keyring of server secret keys to use (key
+ type "keyring"). See the "Security" section.
+
+ (*) RXRPC_EXCLUSIVE_CONNECTION
+
+ This is used to request that new connections should be used for each call
+ made subsequently on this socket. optval should be NULL and optlen 0.
+
+ (*) RXRPC_MIN_SECURITY_LEVEL
+
+ This is used to specify the minimum security level required for calls on
+ this socket. optval must point to an int containing one of the following
+ values:
+
+ (a) RXRPC_SECURITY_PLAIN
+
+ Encrypted checksum only.
+
+ (b) RXRPC_SECURITY_AUTH
+
+ Encrypted checksum plus packet padded and first eight bytes of packet
+ encrypted - which includes the actual packet length.
+
+ (c) RXRPC_SECURITY_ENCRYPTED
+
+ Encrypted checksum plus entire packet padded and encrypted, including
+ actual packet length.
+
+
+========
+SECURITY
+========
+
+Currently, only the kerberos 4 equivalent protocol has been implemented
+(security index 2 - rxkad). This requires the rxkad module to be loaded and,
+on the client, tickets of the appropriate type to be obtained from the AFS
+kaserver or the kerberos server and installed as "rxrpc" type keys. This is
+normally done using the klog program. An example simple klog program can be
+found at:
+
+ http://people.redhat.com/~dhowells/rxrpc/klog.c
+
+The payload provided to add_key() on the client should be of the following
+form:
+
+ struct rxrpc_key_sec2_v1 {
+ uint16_t security_index; /* 2 */
+ uint16_t ticket_length; /* length of ticket[] */
+ uint32_t expiry; /* time at which expires */
+ uint8_t kvno; /* key version number */
+ uint8_t __pad[3];
+ uint8_t session_key[8]; /* DES session key */
+ uint8_t ticket[0]; /* the encrypted ticket */
+ };
+
+Where the ticket blob is just appended to the above structure.
+
+
+For the server, keys of type "rxrpc_s" must be made available to the server.
+They have a description of "<serviceID>:<securityIndex>" (eg: "52:2" for an
+rxkad key for the AFS VL service). When such a key is created, it should be
+given the server's secret key as the instantiation data (see the example
+below).
+
+ add_key("rxrpc_s", "52:2", secret_key, 8, keyring);
+
+A keyring is passed to the server socket by naming it in a sockopt. The server
+socket then looks the server secret keys up in this keyring when secure
+incoming connections are made. This can be seen in an example program that can
+be found at:
+
+ http://people.redhat.com/~dhowells/rxrpc/listen.c
+
+
+====================
+EXAMPLE CLIENT USAGE
+====================
+
+A client would issue an operation by:
+
+ (1) An RxRPC socket is set up by:
+
+ client = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
+
+ Where the third parameter indicates the protocol family of the transport
+ socket used - usually IPv4 but it can also be IPv6 [TODO].
+
+ (2) A local address can optionally be bound:
+
+ struct sockaddr_rxrpc srx = {
+ .srx_family = AF_RXRPC,
+ .srx_service = 0, /* we're a client */
+ .transport_type = SOCK_DGRAM, /* type of transport socket */
+ .transport.sin_family = AF_INET,
+ .transport.sin_port = htons(7000), /* AFS callback */
+ .transport.sin_address = 0, /* all local interfaces */
+ };
+ bind(client, &srx, sizeof(srx));
+
+ This specifies the local UDP port to be used. If not given, a random
+ non-privileged port will be used. A UDP port may be shared between
+ several unrelated RxRPC sockets. Security is handled on a basis of
+ per-RxRPC virtual connection.
+
+ (3) The security is set:
+
+ const char *key = "AFS:cambridge.redhat.com";
+ setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key));
+
+ This issues a request_key() to get the key representing the security
+ context. The minimum security level can be set:
+
+ unsigned int sec = RXRPC_SECURITY_ENCRYPTED;
+ setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL,
+ &sec, sizeof(sec));
+
+ (4) The server to be contacted can then be specified (alternatively this can
+ be done through sendmsg):
+
+ struct sockaddr_rxrpc srx = {
+ .srx_family = AF_RXRPC,
+ .srx_service = VL_SERVICE_ID,
+ .transport_type = SOCK_DGRAM, /* type of transport socket */
+ .transport.sin_family = AF_INET,
+ .transport.sin_port = htons(7005), /* AFS volume manager */
+ .transport.sin_address = ...,
+ };
+ connect(client, &srx, sizeof(srx));
+
+ (5) The request data should then be posted to the server socket using a series
+ of sendmsg() calls, each with the following control message attached:
+
+ RXRPC_USER_CALL_ID - specifies the user ID for this call
+
+ MSG_MORE should be set in msghdr::msg_flags on all but the last part of
+ the request. Multiple requests may be made simultaneously.
+
+ If a call is intended to go to a destination other then the default
+ specified through connect(), then msghdr::msg_name should be set on the
+ first request message of that call.
+
+ (6) The reply data will then be posted to the server socket for recvmsg() to
+ pick up. MSG_MORE will be flagged by recvmsg() if there's more reply data
+ for a particular call to be read. MSG_EOR will be set on the terminal
+ read for a call.
+
+ All data will be delivered with the following control message attached:
+
+ RXRPC_USER_CALL_ID - specifies the user ID for this call
+
+ If an abort or error occurred, this will be returned in the control data
+ buffer instead, and MSG_EOR will be flagged to indicate the end of that
+ call.
+
+
+====================
+EXAMPLE SERVER USAGE
+====================
+
+A server would be set up to accept operations in the following manner:
+
+ (1) An RxRPC socket is created by:
+
+ server = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
+
+ Where the third parameter indicates the address type of the transport
+ socket used - usually IPv4.
+
+ (2) Security is set up if desired by giving the socket a keyring with server
+ secret keys in it:
+
+ keyring = add_key("keyring", "AFSkeys", NULL, 0,
+ KEY_SPEC_PROCESS_KEYRING);
+
+ const char secret_key[8] = {
+ 0xa7, 0x83, 0x8a, 0xcb, 0xc7, 0x83, 0xec, 0x94 };
+ add_key("rxrpc_s", "52:2", secret_key, 8, keyring);
+
+ setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7);
+
+ The keyring can be manipulated after it has been given to the socket. This
+ permits the server to add more keys, replace keys, etc. whilst it is live.
+
+ (2) A local address must then be bound:
+
+ struct sockaddr_rxrpc srx = {
+ .srx_family = AF_RXRPC,
+ .srx_service = VL_SERVICE_ID, /* RxRPC service ID */
+ .transport_type = SOCK_DGRAM, /* type of transport socket */
+ .transport.sin_family = AF_INET,
+ .transport.sin_port = htons(7000), /* AFS callback */
+ .transport.sin_address = 0, /* all local interfaces */
+ };
+ bind(server, &srx, sizeof(srx));
+
+ (3) The server is then set to listen out for incoming calls:
+
+ listen(server, 100);
+
+ (4) The kernel notifies the server of pending incoming connections by sending
+ it a message for each. This is received with recvmsg() on the server
+ socket. It has no data, and has a single dataless control message
+ attached:
+
+ RXRPC_NEW_CALL
+
+ The address that can be passed back by recvmsg() at this point should be
+ ignored since the call for which the message was posted may have gone by
+ the time it is accepted - in which case the first call still on the queue
+ will be accepted.
+
+ (5) The server then accepts the new call by issuing a sendmsg() with two
+ pieces of control data and no actual data:
+
+ RXRPC_ACCEPT - indicate connection acceptance
+ RXRPC_USER_CALL_ID - specify user ID for this call
+
+ (6) The first request data packet will then be posted to the server socket for
+ recvmsg() to pick up. At that point, the RxRPC address for the call can
+ be read from the address fields in the msghdr struct.
+
+ Subsequent request data will be posted to the server socket for recvmsg()
+ to collect as it arrives. All but the last piece of the request data will
+ be delivered with MSG_MORE flagged.
+
+ All data will be delivered with the following control message attached:
+
+ RXRPC_USER_CALL_ID - specifies the user ID for this call
+
+ (8) The reply data should then be posted to the server socket using a series
+ of sendmsg() calls, each with the following control messages attached:
+
+ RXRPC_USER_CALL_ID - specifies the user ID for this call
+
+ MSG_MORE should be set in msghdr::msg_flags on all but the last message
+ for a particular call.
+
+ (9) The final ACK from the client will be posted for retrieval by recvmsg()
+ when it is received. It will take the form of a dataless message with two
+ control messages attached:
+
+ RXRPC_USER_CALL_ID - specifies the user ID for this call
+ RXRPC_ACK - indicates final ACK (no data)
+
+ MSG_EOR will be flagged to indicate that this is the final message for
+ this call.
+
+(10) Up to the point the final packet of reply data is sent, the call can be
+ aborted by calling sendmsg() with a dataless message with the following
+ control messages attached:
+
+ RXRPC_USER_CALL_ID - specifies the user ID for this call
+ RXRPC_ABORT - indicates abort code (4 byte data)
+
+ Any packets waiting in the socket's receive queue will be discarded if
+ this is issued.
+
+Note that all the communications for a particular service take place through
+the one server socket, using control messages on sendmsg() and recvmsg() to
+determine the call affected.
+
+
+=========================
+AF_RXRPC KERNEL INTERFACE
+=========================
+
+The AF_RXRPC module also provides an interface for use by in-kernel utilities
+such as the AFS filesystem. This permits such a utility to:
+
+ (1) Use different keys directly on individual client calls on one socket
+ rather than having to open a whole slew of sockets, one for each key it
+ might want to use.
+
+ (2) Avoid having RxRPC call request_key() at the point of issue of a call or
+ opening of a socket. Instead the utility is responsible for requesting a
+ key at the appropriate point. AFS, for instance, would do this during VFS
+ operations such as open() or unlink(). The key is then handed through
+ when the call is initiated.
+
+ (3) Request the use of something other than GFP_KERNEL to allocate memory.
+
+ (4) Avoid the overhead of using the recvmsg() call. RxRPC messages can be
+ intercepted before they get put into the socket Rx queue and the socket
+ buffers manipulated directly.
+
+To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket,
+bind an addess as appropriate and listen if it's to be a server socket, but
+then it passes this to the kernel interface functions.
+
+The kernel interface functions are as follows:
+
+ (*) Begin a new client call.
+
+ struct rxrpc_call *
+ rxrpc_kernel_begin_call(struct socket *sock,
+ struct sockaddr_rxrpc *srx,
+ struct key *key,
+ unsigned long user_call_ID,
+ gfp_t gfp);
+
+ This allocates the infrastructure to make a new RxRPC call and assigns
+ call and connection numbers. The call will be made on the UDP port that
+ the socket is bound to. The call will go to the destination address of a
+ connected client socket unless an alternative is supplied (srx is
+ non-NULL).
+
+ If a key is supplied then this will be used to secure the call instead of
+ the key bound to the socket with the RXRPC_SECURITY_KEY sockopt. Calls
+ secured in this way will still share connections if at all possible.
+
+ The user_call_ID is equivalent to that supplied to sendmsg() in the
+ control data buffer. It is entirely feasible to use this to point to a
+ kernel data structure.
+
+ If this function is successful, an opaque reference to the RxRPC call is
+ returned. The caller now holds a reference on this and it must be
+ properly ended.
+
+ (*) End a client call.
+
+ void rxrpc_kernel_end_call(struct rxrpc_call *call);
+
+ This is used to end a previously begun call. The user_call_ID is expunged
+ from AF_RXRPC's knowledge and will not be seen again in association with
+ the specified call.
+
+ (*) Send data through a call.
+
+ int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg,
+ size_t len);
+
+ This is used to supply either the request part of a client call or the
+ reply part of a server call. msg.msg_iovlen and msg.msg_iov specify the
+ data buffers to be used. msg_iov may not be NULL and must point
+ exclusively to in-kernel virtual addresses. msg.msg_flags may be given
+ MSG_MORE if there will be subsequent data sends for this call.
+
+ The msg must not specify a destination address, control data or any flags
+ other than MSG_MORE. len is the total amount of data to transmit.
+
+ (*) Abort a call.
+
+ void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code);
+
+ This is used to abort a call if it's still in an abortable state. The
+ abort code specified will be placed in the ABORT message sent.
+
+ (*) Intercept received RxRPC messages.
+
+ typedef void (*rxrpc_interceptor_t)(struct sock *sk,
+ unsigned long user_call_ID,
+ struct sk_buff *skb);
+
+ void
+ rxrpc_kernel_intercept_rx_messages(struct socket *sock,
+ rxrpc_interceptor_t interceptor);
+
+ This installs an interceptor function on the specified AF_RXRPC socket.
+ All messages that would otherwise wind up in the socket's Rx queue are
+ then diverted to this function. Note that care must be taken to process
+ the messages in the right order to maintain DATA message sequentiality.
+
+ The interceptor function itself is provided with the address of the socket
+ and handling the incoming message, the ID assigned by the kernel utility
+ to the call and the socket buffer containing the message.
+
+ The skb->mark field indicates the type of message:
+
+ MARK MEANING
+ =============================== =======================================
+ RXRPC_SKB_MARK_DATA Data message
+ RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call
+ RXRPC_SKB_MARK_BUSY Client call rejected as server busy
+ RXRPC_SKB_MARK_REMOTE_ABORT Call aborted by peer
+ RXRPC_SKB_MARK_NET_ERROR Network error detected
+ RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered
+ RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance
+
+ The remote abort message can be probed with rxrpc_kernel_get_abort_code().
+ The two error messages can be probed with rxrpc_kernel_get_error_number().
+ A new call can be accepted with rxrpc_kernel_accept_call().
+
+ Data messages can have their contents extracted with the usual bunch of
+ socket buffer manipulation functions. A data message can be determined to
+ be the last one in a sequence with rxrpc_kernel_is_data_last(). When a
+ data message has been used up, rxrpc_kernel_data_delivered() should be
+ called on it..
+
+ Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose
+ of. It is possible to get extra refs on all types of message for later
+ freeing, but this may pin the state of a call until the message is finally
+ freed.
+
+ (*) Accept an incoming call.
+
+ struct rxrpc_call *
+ rxrpc_kernel_accept_call(struct socket *sock,
+ unsigned long user_call_ID);
+
+ This is used to accept an incoming call and to assign it a call ID. This
+ function is similar to rxrpc_kernel_begin_call() and calls accepted must
+ be ended in the same way.
+
+ If this function is successful, an opaque reference to the RxRPC call is
+ returned. The caller now holds a reference on this and it must be
+ properly ended.
+
+ (*) Reject an incoming call.
+
+ int rxrpc_kernel_reject_call(struct socket *sock);
+
+ This is used to reject the first incoming call on the socket's queue with
+ a BUSY message. -ENODATA is returned if there were no incoming calls.
+ Other errors may be returned if the call had been aborted (-ECONNABORTED)
+ or had timed out (-ETIME).
+
+ (*) Record the delivery of a data message and free it.
+
+ void rxrpc_kernel_data_delivered(struct sk_buff *skb);
+
+ This is used to record a data message as having been delivered and to
+ update the ACK state for the call. The socket buffer will be freed.
+
+ (*) Free a message.
+
+ void rxrpc_kernel_free_skb(struct sk_buff *skb);
+
+ This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC
+ socket.
+
+ (*) Determine if a data message is the last one on a call.
+
+ bool rxrpc_kernel_is_data_last(struct sk_buff *skb);
+
+ This is used to determine if a socket buffer holds the last data message
+ to be received for a call (true will be returned if it does, false
+ if not).
+
+ The data message will be part of the reply on a client call and the
+ request on an incoming call. In the latter case there will be more
+ messages, but in the former case there will not.
+
+ (*) Get the abort code from an abort message.
+
+ u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb);
+
+ This is used to extract the abort code from a remote abort message.
+
+ (*) Get the error number from a local or network error message.
+
+ int rxrpc_kernel_get_error_number(struct sk_buff *skb);
+
+ This is used to extract the error number from a message indicating either
+ a local error occurred or a network error occurred.
diff --git a/Documentation/networking/wan-router.txt b/Documentation/networking/wan-router.txt
index 653978dcea7..07dd6d9930a 100644
--- a/Documentation/networking/wan-router.txt
+++ b/Documentation/networking/wan-router.txt
@@ -250,7 +250,6 @@ PRODUCT COMPONENTS AND RELATED FILES
sdladrv.h SDLA support module API definitions
sdlasfm.h SDLA firmware module definitions
if_wanpipe.h WANPIPE Socket definitions
- if_wanpipe_common.h WANPIPE Socket/Driver common definitions.
sdlapci.h WANPIPE PCI definitions
diff --git a/Documentation/s390/crypto/crypto-API.txt b/Documentation/s390/crypto/crypto-API.txt
deleted file mode 100644
index 71ae6ca9f2c..00000000000
--- a/Documentation/s390/crypto/crypto-API.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-crypto-API support for z990 Message Security Assist (MSA) instructions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-AUTHOR: Thomas Spatzier (tspat@de.ibm.com)
-
-
-1. Introduction crypto-API
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-See Documentation/crypto/api-intro.txt for an introduction/description of the
-kernel crypto API.
-According to api-intro.txt support for z990 crypto instructions has been added
-in the algorithm api layer of the crypto API. Several files containing z990
-optimized implementations of crypto algorithms are placed in the
-arch/s390/crypto directory.
-
-
-2. Probing for availability of MSA
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-It should be possible to use Kernels with the z990 crypto implementations both
-on machines with MSA available and on those without MSA (pre z990 or z990
-without MSA). Therefore a simple probing mechanism has been implemented:
-In the init function of each crypto module the availability of MSA and of the
-respective crypto algorithm in particular will be tested. If the algorithm is
-available the module will load and register its algorithm with the crypto API.
-
-If the respective crypto algorithm is not available, the init function will
-return -ENOSYS. In that case a fallback to the standard software implementation
-of the crypto algorithm must be taken ( -> the standard crypto modules are
-also built when compiling the kernel).
-
-
-3. Ensuring z990 crypto module preference
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-If z990 crypto instructions are available the optimized modules should be
-preferred instead of standard modules.
-
-3.1. compiled-in modules
-~~~~~~~~~~~~~~~~~~~~~~~~
-For compiled-in modules it has to be ensured that the z990 modules are linked
-before the standard crypto modules. Then, on system startup the init functions
-of z990 crypto modules will be called first and query for availability of z990
-crypto instructions. If instruction is available, the z990 module will register
-its crypto algorithm implementation -> the load of the standard module will fail
-since the algorithm is already registered.
-If z990 crypto instruction is not available the load of the z990 module will
-fail -> the standard module will load and register its algorithm.
-
-3.2. dynamic modules
-~~~~~~~~~~~~~~~~~~~~
-A system administrator has to take care of giving preference to z990 crypto
-modules. If MSA is available appropriate lines have to be added to
-/etc/modprobe.conf.
-
-Example: z990 crypto instruction for SHA1 algorithm is available
-
- add the following line to /etc/modprobe.conf (assuming the
- z990 crypto modules for SHA1 is called sha1_z990):
-
- alias sha1 sha1_z990
-
- -> when the sha1 algorithm is requested through the crypto API
- (which has a module autoloader) the z990 module will be loaded.
-
-TBD: a userspace module probing mechanism
- something like 'probe sha1 sha1_z990 sha1' in modprobe.conf
- -> try module sha1_z990, if it fails to load standard module sha1
- the 'probe' statement is currently not supported in modprobe.conf
-
-
-4. Currently implemented z990 crypto algorithms
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The following crypto algorithms with z990 MSA support are currently implemented.
-The name of each algorithm under which it is registered in crypto API and the
-name of the respective module is given in square brackets.
-
-- SHA1 Digest Algorithm [sha1 -> sha1_z990]
-- DES Encrypt/Decrypt Algorithm (64bit key) [des -> des_z990]
-- Triple DES Encrypt/Decrypt Algorithm (128bit key) [des3_ede128 -> des_z990]
-- Triple DES Encrypt/Decrypt Algorithm (192bit key) [des3_ede -> des_z990]
-
-In order to load, for example, the sha1_z990 module when the sha1 algorithm is
-requested (see 3.2.) add 'alias sha1 sha1_z990' to /etc/modprobe.conf.
-
diff --git a/Documentation/s390/zfcpdump.txt b/Documentation/s390/zfcpdump.txt
new file mode 100644
index 00000000000..cf45d27c460
--- /dev/null
+++ b/Documentation/s390/zfcpdump.txt
@@ -0,0 +1,87 @@
+s390 SCSI dump tool (zfcpdump)
+
+System z machines (z900 or higher) provide hardware support for creating system
+dumps on SCSI disks. The dump process is initiated by booting a dump tool, which
+has to create a dump of the current (probably crashed) Linux image. In order to
+not overwrite memory of the crashed Linux with data of the dump tool, the
+hardware saves some memory plus the register sets of the boot cpu before the
+dump tool is loaded. There exists an SCLP hardware interface to obtain the saved
+memory afterwards. Currently 32 MB are saved.
+
+This zfcpdump implementation consists of a Linux dump kernel together with
+a userspace dump tool, which are loaded together into the saved memory region
+below 32 MB. zfcpdump is installed on a SCSI disk using zipl (as contained in
+the s390-tools package) to make the device bootable. The operator of a Linux
+system can then trigger a SCSI dump by booting the SCSI disk, where zfcpdump
+resides on.
+
+The kernel part of zfcpdump is implemented as a debugfs file under "zcore/mem",
+which exports memory and registers of the crashed Linux in an s390
+standalone dump format. It can be used in the same way as e.g. /dev/mem. The
+dump format defines a 4K header followed by plain uncompressed memory. The
+register sets are stored in the prefix pages of the respective cpus. To build a
+dump enabled kernel with the zcore driver, the kernel config option
+CONFIG_ZFCPDUMP has to be set. When reading from "zcore/mem", the part of
+memory, which has been saved by hardware is read by the driver via the SCLP
+hardware interface. The second part is just copied from the non overwritten real
+memory.
+
+The userspace application of zfcpdump can reside e.g. in an intitramfs or an
+initrd. It reads from zcore/mem and writes the system dump to a file on a
+SCSI disk.
+
+To build a zfcpdump kernel use the following settings in your kernel
+configuration:
+ * CONFIG_ZFCPDUMP=y
+ * Enable ZFCP driver
+ * Enable SCSI driver
+ * Enable ext2 and ext3 filesystems
+ * Disable as many features as possible to keep the kernel small.
+ E.g. network support is not needed at all.
+
+To use the zfcpdump userspace application in an initramfs you have to do the
+following:
+
+ * Copy the zfcpdump executable somewhere into your Linux tree.
+ E.g. to "arch/s390/boot/zfcpdump. If you do not want to include
+ shared libraries, compile the tool with the "-static" gcc option.
+ * If you want to include e2fsck, add it to your source tree, too. The zfcpdump
+ application attempts to start /sbin/e2fsck from the ramdisk.
+ * Use an initramfs config file like the following:
+
+ dir /dev 755 0 0
+ nod /dev/console 644 0 0 c 5 1
+ nod /dev/null 644 0 0 c 1 3
+ nod /dev/sda1 644 0 0 b 8 1
+ nod /dev/sda2 644 0 0 b 8 2
+ nod /dev/sda3 644 0 0 b 8 3
+ nod /dev/sda4 644 0 0 b 8 4
+ nod /dev/sda5 644 0 0 b 8 5
+ nod /dev/sda6 644 0 0 b 8 6
+ nod /dev/sda7 644 0 0 b 8 7
+ nod /dev/sda8 644 0 0 b 8 8
+ nod /dev/sda9 644 0 0 b 8 9
+ nod /dev/sda10 644 0 0 b 8 10
+ nod /dev/sda11 644 0 0 b 8 11
+ nod /dev/sda12 644 0 0 b 8 12
+ nod /dev/sda13 644 0 0 b 8 13
+ nod /dev/sda14 644 0 0 b 8 14
+ nod /dev/sda15 644 0 0 b 8 15
+ file /init arch/s390/boot/zfcpdump 755 0 0
+ file /sbin/e2fsck arch/s390/boot/e2fsck 755 0 0
+ dir /proc 755 0 0
+ dir /sys 755 0 0
+ dir /mnt 755 0 0
+ dir /sbin 755 0 0
+
+ * Issue "make image" to build the zfcpdump image with initramfs.
+
+In a Linux distribution the zfcpdump enabled kernel image must be copied to
+/usr/share/zfcpdump/zfcpdump.image, where the s390 zipl tool is looking for the
+dump kernel when preparing a SCSI dump disk.
+
+If you use a ramdisk copy it to "/usr/share/zfcpdump/zfcpdump.rd".
+
+For more information on how to use zfcpdump refer to the s390 'Using the Dump
+Tools book', which is available from
+http://www.ibm.com/developerworks/linux/linux390.