aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/00-INDEX2
-rw-r--r--Documentation/ABI/testing/sysfs-bus-fcoe45
-rw-r--r--Documentation/ABI/testing/sysfs-bus-mei7
-rw-r--r--Documentation/ABI/testing/sysfs-bus-usb6
-rw-r--r--Documentation/ABI/testing/sysfs-devices-system-cpu12
-rw-r--r--Documentation/ABI/testing/sysfs-platform-msi-laptop83
-rw-r--r--Documentation/DocBook/device-drivers.tmpl2
-rw-r--r--Documentation/RCU/checklist.txt26
-rw-r--r--Documentation/RCU/lockdep.txt5
-rw-r--r--Documentation/RCU/rcubarrier.txt15
-rw-r--r--Documentation/RCU/stallwarn.txt33
-rw-r--r--Documentation/RCU/whatisRCU.txt4
-rw-r--r--Documentation/SubmittingPatches12
-rw-r--r--Documentation/arm/sunxi/clocks.txt56
-rw-r--r--Documentation/backlight/lp855x-driver.txt7
-rw-r--r--Documentation/cgroups/cgroups.txt3
-rw-r--r--Documentation/cgroups/devices.txt70
-rw-r--r--Documentation/cgroups/memory.txt70
-rw-r--r--Documentation/clk.txt15
-rw-r--r--Documentation/device-mapper/cache-policies.txt77
-rw-r--r--Documentation/device-mapper/cache.txt243
-rw-r--r--Documentation/device-mapper/dm-raid.txt44
-rw-r--r--Documentation/devicetree/bindings/arc/interrupts.txt24
-rw-r--r--Documentation/devicetree/bindings/arm/armadeus.txt6
-rw-r--r--Documentation/devicetree/bindings/arm/atmel-adc.txt13
-rw-r--r--Documentation/devicetree/bindings/arm/fsl.txt8
-rw-r--r--Documentation/devicetree/bindings/arm/msm/ssbi.txt18
-rw-r--r--Documentation/devicetree/bindings/arm/samsung/exynos-adc.txt60
-rw-r--r--Documentation/devicetree/bindings/clock/axi-clkgen.txt22
-rw-r--r--Documentation/devicetree/bindings/clock/fixed-factor-clock.txt24
-rw-r--r--Documentation/devicetree/bindings/clock/imx5-clock.txt1
-rw-r--r--Documentation/devicetree/bindings/clock/imx6q-clock.txt2
-rw-r--r--Documentation/devicetree/bindings/clock/silabs,si5351.txt114
-rw-r--r--Documentation/devicetree/bindings/clock/sunxi.txt151
-rw-r--r--Documentation/devicetree/bindings/dma/snps-dma.txt70
-rw-r--r--Documentation/devicetree/bindings/gpio/gpio.txt6
-rw-r--r--Documentation/devicetree/bindings/hwmon/ntc_thermistor.txt29
-rw-r--r--Documentation/devicetree/bindings/iio/iio-bindings.txt97
-rw-r--r--Documentation/devicetree/bindings/media/coda.txt30
-rw-r--r--Documentation/devicetree/bindings/metag/meta-intc.txt82
-rw-r--r--Documentation/devicetree/bindings/mfd/ab8500.txt6
-rw-r--r--Documentation/devicetree/bindings/mfd/mc13xxx.txt36
-rw-r--r--Documentation/devicetree/bindings/mips/cpu_irq.txt47
-rw-r--r--Documentation/devicetree/bindings/misc/sram.txt16
-rw-r--r--Documentation/devicetree/bindings/mtd/elm.txt16
-rw-r--r--Documentation/devicetree/bindings/mtd/mtd-physmap.txt3
-rw-r--r--Documentation/devicetree/bindings/pinctrl/pinctrl-single.txt109
-rw-r--r--Documentation/devicetree/bindings/pinctrl/samsung-pinctrl.txt3
-rw-r--r--Documentation/devicetree/bindings/regulator/max8952.txt52
-rw-r--r--Documentation/devicetree/bindings/rtc/atmel,at91rm9200-rtc.txt15
-rw-r--r--Documentation/devicetree/bindings/serial/lantiq_asc.txt16
-rw-r--r--Documentation/devicetree/bindings/spi/brcm,bcm2835-spi.txt22
-rw-r--r--Documentation/devicetree/bindings/spi/fsl-spi.txt3
-rw-r--r--Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt26
-rw-r--r--Documentation/devicetree/bindings/spi/spi-samsung.txt8
-rw-r--r--Documentation/devicetree/bindings/staging/dwc2.txt15
-rw-r--r--Documentation/devicetree/bindings/staging/imx-drm/fsl-imx-drm.txt2
-rw-r--r--Documentation/devicetree/bindings/thermal/dove-thermal.txt18
-rw-r--r--Documentation/devicetree/bindings/thermal/kirkwood-thermal.txt15
-rw-r--r--Documentation/devicetree/bindings/thermal/rcar-thermal.txt29
-rw-r--r--Documentation/devicetree/bindings/timer/marvell,armada-370-xp-timer.txt (renamed from Documentation/devicetree/bindings/arm/armada-370-xp-timer.txt)11
-rw-r--r--Documentation/devicetree/bindings/tty/serial/of-serial.txt7
-rw-r--r--Documentation/devicetree/bindings/usb/ci13xxx-imx.txt2
-rw-r--r--Documentation/devicetree/bindings/usb/ehci-omap.txt32
-rw-r--r--Documentation/devicetree/bindings/usb/ohci-omap3.txt15
-rw-r--r--Documentation/devicetree/bindings/usb/omap-usb.txt40
-rw-r--r--Documentation/devicetree/bindings/usb/samsung-usbphy.txt76
-rw-r--r--Documentation/devicetree/bindings/usb/usb-nop-xceiv.txt34
-rw-r--r--Documentation/devicetree/bindings/vendor-prefixes.txt2
-rw-r--r--Documentation/devicetree/bindings/video/backlight/lp855x.txt41
-rw-r--r--Documentation/devicetree/bindings/video/backlight/tps65217-backlight.txt27
-rw-r--r--Documentation/devicetree/bindings/video/via,vt8500-fb.txt48
-rw-r--r--Documentation/devicetree/bindings/video/wm,wm8505-fb.txt32
-rw-r--r--Documentation/devicetree/bindings/w1/fsl-imx-owire.txt19
-rw-r--r--Documentation/devicetree/bindings/watchdog/atmel-at91rm9200-wdt.txt9
-rw-r--r--Documentation/devicetree/bindings/watchdog/atmel-wdt.txt4
-rw-r--r--Documentation/devicetree/bindings/watchdog/marvel.txt5
-rw-r--r--Documentation/devicetree/bindings/watchdog/pnx4008-wdt.txt4
-rw-r--r--Documentation/devicetree/bindings/watchdog/qca-ar7130-wdt.txt13
-rw-r--r--Documentation/devicetree/bindings/watchdog/samsung-wdt.txt3
-rw-r--r--Documentation/dma-buf-sharing.txt6
-rw-r--r--Documentation/filesystems/vfat.txt26
-rw-r--r--Documentation/hwmon/adm12752
-rw-r--r--Documentation/hwmon/adt741050
-rw-r--r--Documentation/hwmon/jc422
-rw-r--r--Documentation/hwmon/lineage-pem2
-rw-r--r--Documentation/hwmon/lm2506636
-rw-r--r--Documentation/hwmon/lm752
-rw-r--r--Documentation/hwmon/lm9523436
-rw-r--r--Documentation/hwmon/ltc2978149
-rw-r--r--Documentation/hwmon/ltc42612
-rw-r--r--Documentation/hwmon/max160642
-rw-r--r--Documentation/hwmon/max160652
-rw-r--r--Documentation/hwmon/max344402
-rw-r--r--Documentation/hwmon/max86882
-rw-r--r--Documentation/hwmon/nct6775188
-rw-r--r--Documentation/hwmon/pmbus2
-rw-r--r--Documentation/hwmon/sht152
-rw-r--r--Documentation/hwmon/smm6652
-rw-r--r--Documentation/hwmon/tmp40125
-rw-r--r--Documentation/hwmon/ucd90002
-rw-r--r--Documentation/hwmon/ucd92002
-rw-r--r--Documentation/hwmon/zl61004
-rw-r--r--Documentation/i2c/busses/i2c-diolan-u2c2
-rw-r--r--Documentation/ia64/err_inject.txt2
-rw-r--r--Documentation/input/alps.txt67
-rw-r--r--Documentation/ioctl/ioctl-number.txt1
-rw-r--r--Documentation/kdump/kdump.txt1
-rw-r--r--Documentation/kernel-parameters.txt128
-rw-r--r--Documentation/metag/00-INDEX4
-rw-r--r--Documentation/metag/kernel-ABI.txt256
-rw-r--r--Documentation/misc-devices/mei/mei-client-bus.txt138
-rw-r--r--Documentation/networking/ipvs-sysctl.txt7
-rw-r--r--Documentation/networking/tuntap.txt77
-rw-r--r--Documentation/pinctrl.txt112
-rw-r--r--Documentation/power/opp.txt25
-rw-r--r--Documentation/printk-formats.txt2
-rw-r--r--Documentation/s390/s390dbf.txt3
-rw-r--r--Documentation/scsi/ChangeLog.megaraid_sas9
-rw-r--r--Documentation/scsi/LICENSE.qla2xxx2
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt7
-rw-r--r--Documentation/sound/alsa/seq_oss.html2
-rw-r--r--Documentation/sysctl/vm.txt50
-rw-r--r--Documentation/thermal/exynos_thermal_emulation53
-rw-r--r--Documentation/thermal/intel_powerclamp.txt307
-rw-r--r--Documentation/thermal/sysfs-api.txt18
-rw-r--r--Documentation/this_cpu_ops.txt205
-rw-r--r--Documentation/trace/ftrace.txt2099
-rw-r--r--Documentation/trace/uprobetracer.txt114
-rw-r--r--Documentation/usb/power-management.txt10
-rw-r--r--Documentation/vm/overcommit-accounting8
-rw-r--r--Documentation/watchdog/watchdog-kernel-api.txt14
132 files changed, 5716 insertions, 1030 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 0f3e8bbab8d7..45b3df936d2f 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -299,6 +299,8 @@ memory-hotplug.txt
- Hotpluggable memory support, how to use and current status.
memory.txt
- info on typical Linux memory problems.
+metag/
+ - directory with info about Linux on Meta architecture.
mips/
- directory with info about Linux on MIPS architecture.
misc-devices/
diff --git a/Documentation/ABI/testing/sysfs-bus-fcoe b/Documentation/ABI/testing/sysfs-bus-fcoe
index 50e2a80ea28f..21640eaad371 100644
--- a/Documentation/ABI/testing/sysfs-bus-fcoe
+++ b/Documentation/ABI/testing/sysfs-bus-fcoe
@@ -1,14 +1,53 @@
-What: /sys/bus/fcoe/ctlr_X
+What: /sys/bus/fcoe/
+Date: August 2012
+KernelVersion: TBD
+Contact: Robert Love <robert.w.love@intel.com>, devel@open-fcoe.org
+Description: The FCoE bus. Attributes in this directory are control interfaces.
+Attributes:
+
+ ctlr_create: 'FCoE Controller' instance creation interface. Writing an
+ <ifname> to this file will allocate and populate sysfs with a
+ fcoe_ctlr_device (ctlr_X). The user can then configure any
+ per-port settings and finally write to the fcoe_ctlr_device's
+ 'start' attribute to begin the kernel's discovery and login
+ process.
+
+ ctlr_destroy: 'FCoE Controller' instance removal interface. Writing a
+ fcoe_ctlr_device's sysfs name to this file will log the
+ fcoe_ctlr_device out of the fabric or otherwise connected
+ FCoE devices. It will also free all kernel memory allocated
+ for this fcoe_ctlr_device and any structures associated
+ with it, this includes the scsi_host.
+
+What: /sys/bus/fcoe/devices/ctlr_X
Date: March 2012
KernelVersion: TBD
Contact: Robert Love <robert.w.love@intel.com>, devel@open-fcoe.org
-Description: 'FCoE Controller' instances on the fcoe bus
+Description: 'FCoE Controller' instances on the fcoe bus.
+ The FCoE Controller now has a three stage creation process.
+ 1) Write interface name to ctlr_create 2) Configure the FCoE
+ Controller (ctlr_X) 3) Enable the FCoE Controller to begin
+ discovery and login. The FCoE Controller is destroyed by
+ writing it's name, i.e. ctlr_X to the ctlr_delete file.
+
Attributes:
fcf_dev_loss_tmo: Device loss timeout peroid (see below). Changing
this value will change the dev_loss_tmo for all
FCFs discovered by this controller.
+ mode: Display or change the FCoE Controller's mode. Possible
+ modes are 'Fabric' and 'VN2VN'. If a FCoE Controller
+ is started in 'Fabric' mode then FIP FCF discovery is
+ initiated and ultimately a fabric login is attempted.
+ If a FCoE Controller is started in 'VN2VN' mode then
+ FIP VN2VN discovery and login is performed. A FCoE
+ Controller only supports one mode at a time.
+
+ enabled: Whether an FCoE controller is enabled or disabled.
+ 0 if disabled, 1 if enabled. Writing either 0 or 1
+ to this file will enable or disable the FCoE controller.
+
lesb/link_fail: Link Error Status Block (LESB) link failure count.
lesb/vlink_fail: Link Error Status Block (LESB) virtual link
@@ -26,7 +65,7 @@ Attributes:
Notes: ctlr_X (global increment starting at 0)
-What: /sys/bus/fcoe/fcf_X
+What: /sys/bus/fcoe/devices/fcf_X
Date: March 2012
KernelVersion: TBD
Contact: Robert Love <robert.w.love@intel.com>, devel@open-fcoe.org
diff --git a/Documentation/ABI/testing/sysfs-bus-mei b/Documentation/ABI/testing/sysfs-bus-mei
new file mode 100644
index 000000000000..2066f0bbd453
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-mei
@@ -0,0 +1,7 @@
+What: /sys/bus/mei/devices/.../modalias
+Date: March 2013
+KernelVersion: 3.10
+Contact: Samuel Ortiz <sameo@linux.intel.com>
+ linux-mei@linux.intel.com
+Description: Stores the same MODALIAS value emitted by uevent
+ Format: mei:<mei device name>
diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb
index c8baaf53594a..f093e59cbe5f 100644
--- a/Documentation/ABI/testing/sysfs-bus-usb
+++ b/Documentation/ABI/testing/sysfs-bus-usb
@@ -32,7 +32,7 @@ Date: January 2008
KernelVersion: 2.6.25
Contact: Sarah Sharp <sarah.a.sharp@intel.com>
Description:
- If CONFIG_PM and CONFIG_USB_SUSPEND are enabled, then this file
+ If CONFIG_PM_RUNTIME is enabled then this file
is present. When read, it returns the total time (in msec)
that the USB device has been connected to the machine. This
file is read-only.
@@ -45,7 +45,7 @@ Date: January 2008
KernelVersion: 2.6.25
Contact: Sarah Sharp <sarah.a.sharp@intel.com>
Description:
- If CONFIG_PM and CONFIG_USB_SUSPEND are enabled, then this file
+ If CONFIG_PM_RUNTIME is enabled then this file
is present. When read, it returns the total time (in msec)
that the USB device has been active, i.e. not in a suspended
state. This file is read-only.
@@ -187,7 +187,7 @@ What: /sys/bus/usb/devices/.../power/usb2_hardware_lpm
Date: September 2011
Contact: Andiry Xu <andiry.xu@amd.com>
Description:
- If CONFIG_USB_SUSPEND is set and a USB 2.0 lpm-capable device
+ If CONFIG_PM_RUNTIME is set and a USB 2.0 lpm-capable device
is plugged in to a xHCI host which support link PM, it will
perform a LPM test; if the test is passed and host supports
USB2 hardware LPM (xHCI 1.0 feature), USB2 hardware LPM will
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 9c978dcae07d..2447698aed41 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -173,3 +173,15 @@ Description: Processor frequency boosting control
Boosting allows the CPU and the firmware to run at a frequency
beyound it's nominal limit.
More details can be found in Documentation/cpu-freq/boost.txt
+
+
+What: /sys/devices/system/cpu/cpu#/crash_notes
+ /sys/devices/system/cpu/cpu#/crash_notes_size
+Date: April 2013
+Contact: kexec@lists.infradead.org
+Description: address and size of the percpu note.
+
+ crash_notes: the physical address of the memory that holds the
+ note of cpu#.
+
+ crash_notes_size: size of the note of cpu#.
diff --git a/Documentation/ABI/testing/sysfs-platform-msi-laptop b/Documentation/ABI/testing/sysfs-platform-msi-laptop
new file mode 100644
index 000000000000..307a247ba1ef
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-platform-msi-laptop
@@ -0,0 +1,83 @@
+What: /sys/devices/platform/msi-laptop-pf/lcd_level
+Date: Oct 2006
+KernelVersion: 2.6.19
+Contact: "Lennart Poettering <mzxreary@0pointer.de>"
+Description:
+ Screen brightness: contains a single integer in the range 0..8.
+
+What: /sys/devices/platform/msi-laptop-pf/auto_brightness
+Date: Oct 2006
+KernelVersion: 2.6.19
+Contact: "Lennart Poettering <mzxreary@0pointer.de>"
+Description:
+ Enable automatic brightness control: contains either 0 or 1. If
+ set to 1 the hardware adjusts the screen brightness
+ automatically when the power cord is plugged/unplugged.
+
+What: /sys/devices/platform/msi-laptop-pf/wlan
+Date: Oct 2006
+KernelVersion: 2.6.19
+Contact: "Lennart Poettering <mzxreary@0pointer.de>"
+Description:
+ WLAN subsystem enabled: contains either 0 or 1.
+
+What: /sys/devices/platform/msi-laptop-pf/bluetooth
+Date: Oct 2006
+KernelVersion: 2.6.19
+Contact: "Lennart Poettering <mzxreary@0pointer.de>"
+Description:
+ Bluetooth subsystem enabled: contains either 0 or 1. Please
+ note that this file is constantly 0 if no Bluetooth hardware is
+ available.
+
+What: /sys/devices/platform/msi-laptop-pf/touchpad
+Date: Nov 2012
+KernelVersion: 3.8
+Contact: "Maxim Mikityanskiy <maxtram95@gmail.com>"
+Description:
+ Contains either 0 or 1 and indicates if touchpad is turned on.
+ Touchpad state can only be toggled by pressing Fn+F3.
+
+What: /sys/devices/platform/msi-laptop-pf/turbo_mode
+Date: Nov 2012
+KernelVersion: 3.8
+Contact: "Maxim Mikityanskiy <maxtram95@gmail.com>"
+Description:
+ Contains either 0 or 1 and indicates if turbo mode is turned
+ on. In turbo mode power LED is orange and processor is
+ overclocked. Turbo mode is available only if charging. It is
+ only possible to toggle turbo mode state by pressing Fn+F10,
+ and there is a few seconds cooldown between subsequent toggles.
+ If user presses Fn+F10 too frequent, turbo mode state is not
+ changed.
+
+What: /sys/devices/platform/msi-laptop-pf/eco_mode
+Date: Nov 2012
+KernelVersion: 3.8
+Contact: "Maxim Mikityanskiy <maxtram95@gmail.com>"
+Description:
+ Contains either 0 or 1 and indicates if ECO mode is turned on.
+ In ECO mode power LED is green and userspace should do some
+ powersaving actions. ECO mode is available only on battery
+ power. ECO mode can only be toggled by pressing Fn+F10.
+
+What: /sys/devices/platform/msi-laptop-pf/turbo_cooldown
+Date: Nov 2012
+KernelVersion: 3.8
+Contact: "Maxim Mikityanskiy <maxtram95@gmail.com>"
+Description:
+ Contains value in range 0..3:
+ * 0 -> Turbo mode is off
+ * 1 -> Turbo mode is on, cannot be turned off yet
+ * 2 -> Turbo mode is off, cannot be turned on yet
+ * 3 -> Turbo mode is on
+
+What: /sys/devices/platform/msi-laptop-pf/auto_fan
+Date: Nov 2012
+KernelVersion: 3.8
+Contact: "Maxim Mikityanskiy <maxtram95@gmail.com>"
+Description:
+ Contains either 0 or 1 and indicates if fan speed is controlled
+ automatically (1) or fan runs at maximal speed (0). Can be
+ toggled in software.
+
diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 7514dbf0a679..c36892c072da 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -227,7 +227,7 @@ X!Isound/sound_firmware.c
<chapter id="uart16x50">
<title>16x50 UART Driver</title>
!Edrivers/tty/serial/serial_core.c
-!Edrivers/tty/serial/8250/8250.c
+!Edrivers/tty/serial/8250/8250_core.c
</chapter>
<chapter id="fbdev">
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 31ef8fe07f82..79e789b8b8ea 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -217,9 +217,14 @@ over a rather long period of time, but improvements are always welcome!
whether the increased speed is worth it.
8. Although synchronize_rcu() is slower than is call_rcu(), it
- usually results in simpler code. So, unless update performance
- is critically important or the updaters cannot block,
- synchronize_rcu() should be used in preference to call_rcu().
+ usually results in simpler code. So, unless update performance is
+ critically important, the updaters cannot block, or the latency of
+ synchronize_rcu() is visible from userspace, synchronize_rcu()
+ should be used in preference to call_rcu(). Furthermore,
+ kfree_rcu() usually results in even simpler code than does
+ synchronize_rcu() without synchronize_rcu()'s multi-millisecond
+ latency. So please take advantage of kfree_rcu()'s "fire and
+ forget" memory-freeing capabilities where it applies.
An especially important property of the synchronize_rcu()
primitive is that it automatically self-limits: if grace periods
@@ -268,7 +273,8 @@ over a rather long period of time, but improvements are always welcome!
e. Periodically invoke synchronize_rcu(), permitting a limited
number of updates per grace period.
- The same cautions apply to call_rcu_bh() and call_rcu_sched().
+ The same cautions apply to call_rcu_bh(), call_rcu_sched(),
+ call_srcu(), and kfree_rcu().
9. All RCU list-traversal primitives, which include
rcu_dereference(), list_for_each_entry_rcu(), and
@@ -296,9 +302,9 @@ over a rather long period of time, but improvements are always welcome!
all currently executing rcu_read_lock()-protected RCU read-side
critical sections complete. It does -not- necessarily guarantee
that all currently running interrupts, NMIs, preempt_disable()
- code, or idle loops will complete. Therefore, if you do not have
- rcu_read_lock()-protected read-side critical sections, do -not-
- use synchronize_rcu().
+ code, or idle loops will complete. Therefore, if your
+ read-side critical sections are protected by something other
+ than rcu_read_lock(), do -not- use synchronize_rcu().
Similarly, disabling preemption is not an acceptable substitute
for rcu_read_lock(). Code that attempts to use preemption
@@ -401,9 +407,9 @@ over a rather long period of time, but improvements are always welcome!
read-side critical sections. It is the responsibility of the
RCU update-side primitives to deal with this.
-17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and
- the __rcu sparse checks to validate your RCU code. These
- can help find problems as follows:
+17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
+ __rcu sparse checks (enabled by CONFIG_SPARSE_RCU_POINTER) to
+ validate your RCU code. These can help find problems as follows:
CONFIG_PROVE_RCU: check that accesses to RCU-protected data
structures are carried out under the proper RCU
diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt
index a102d4b3724b..cd83d2348fef 100644
--- a/Documentation/RCU/lockdep.txt
+++ b/Documentation/RCU/lockdep.txt
@@ -64,6 +64,11 @@ checking of rcu_dereference() primitives:
but retain the compiler constraints that prevent duplicating
or coalescsing. This is useful when when testing the
value of the pointer itself, for example, against NULL.
+ rcu_access_index(idx):
+ Return the value of the index and omit all barriers, but
+ retain the compiler constraints that prevent duplicating
+ or coalescsing. This is useful when when testing the
+ value of the index itself, for example, against -1.
The rcu_dereference_check() check expression can be any boolean
expression, but would normally include a lockdep expression. However,
diff --git a/Documentation/RCU/rcubarrier.txt b/Documentation/RCU/rcubarrier.txt
index 38428c125135..2e319d1b9ef2 100644
--- a/Documentation/RCU/rcubarrier.txt
+++ b/Documentation/RCU/rcubarrier.txt
@@ -79,7 +79,20 @@ complete. Pseudo-code using rcu_barrier() is as follows:
2. Execute rcu_barrier().
3. Allow the module to be unloaded.
-The rcutorture module makes use of rcu_barrier in its exit function
+There are also rcu_barrier_bh(), rcu_barrier_sched(), and srcu_barrier()
+functions for the other flavors of RCU, and you of course must match
+the flavor of rcu_barrier() with that of call_rcu(). If your module
+uses multiple flavors of call_rcu(), then it must also use multiple
+flavors of rcu_barrier() when unloading that module. For example, if
+it uses call_rcu_bh(), call_srcu() on srcu_struct_1, and call_srcu() on
+srcu_struct_2(), then the following three lines of code will be required
+when unloading:
+
+ 1 rcu_barrier_bh();
+ 2 srcu_barrier(&srcu_struct_1);
+ 3 srcu_barrier(&srcu_struct_2);
+
+The rcutorture module makes use of rcu_barrier() in its exit function
as follows:
1 static void
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 1927151b386b..e38b8df3d727 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -92,14 +92,14 @@ If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set,
more information is printed with the stall-warning message, for example:
INFO: rcu_preempt detected stall on CPU
- 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0
+ 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543
(t=65000 jiffies)
In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is
printed:
INFO: rcu_preempt detected stall on CPU
- 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer not pending
+ 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D
(t=65000 jiffies)
The "(64628 ticks this GP)" indicates that this CPU has taken more
@@ -116,13 +116,28 @@ number between the two "/"s is the value of the nesting, which will
be a small positive number if in the idle loop and a very large positive
number (as shown above) otherwise.
-For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the CPU is
-not in the process of trying to force itself into dyntick-idle state, the
-"." indicates that the CPU has not given up forcing RCU into dyntick-idle
-mode (it would be "H" otherwise), and the "timer not pending" indicates
-that the CPU has not recently forced RCU into dyntick-idle mode (it
-would otherwise indicate the number of microseconds remaining in this
-forced state).
+The "softirq=" portion of the message tracks the number of RCU softirq
+handlers that the stalled CPU has executed. The number before the "/"
+is the number that had executed since boot at the time that this CPU
+last noted the beginning of a grace period, which might be the current
+(stalled) grace period, or it might be some earlier grace period (for
+example, if the CPU might have been in dyntick-idle mode for an extended
+time period. The number after the "/" is the number that have executed
+since boot until the current time. If this latter number stays constant
+across repeated stall-warning messages, it is possible that RCU's softirq
+handlers are no longer able to execute on this CPU. This can happen if
+the stalled CPU is spinning with interrupts are disabled, or, in -rt
+kernels, if a high-priority process is starving RCU's softirq handler.
+
+For CONFIG_RCU_FAST_NO_HZ kernels, the "last_accelerate:" prints the
+low-order 16 bits (in hex) of the jiffies counter when this CPU last
+invoked rcu_try_advance_all_cbs() from rcu_needs_cpu() or last invoked
+rcu_accelerate_cbs() from rcu_prepare_for_idle(). The "nonlazy_posted:"
+prints the number of non-lazy callbacks posted since the last call to
+rcu_needs_cpu(). Finally, an "L" indicates that there are currently
+no non-lazy callbacks ("." is printed otherwise, as shown above) and
+"D" indicates that dyntick-idle processing is enabled ("." is printed
+otherwise, for example, if disabled via the "nohz=" kernel boot parameter).
Multiple Warnings From One Stall
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 0cc7820967f4..10df0b82f459 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -265,9 +265,9 @@ rcu_dereference()
rcu_read_lock();
p = rcu_dereference(head.next);
rcu_read_unlock();
- x = p->address;
+ x = p->address; /* BUG!!! */
rcu_read_lock();
- y = p->data;
+ y = p->data; /* BUG!!! */
rcu_read_unlock();
Holding a reference from one RCU read-side critical section
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index c379a2a6949f..6e97e73d87b5 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -60,8 +60,7 @@ own source tree. For example:
"dontdiff" is a list of files which are generated by the kernel during
the build process, and should be ignored in any diff(1)-generated
patch. The "dontdiff" file is included in the kernel tree in
-2.6.12 and later. For earlier kernel versions, you can get it
-from <http://www.xenotime.net/linux/doc/dontdiff>.
+2.6.12 and later.
Make sure your patch does not include any extra files which do not
belong in a patch submission. Make sure to review your patch -after-
@@ -421,7 +420,7 @@ person it names. This tag documents that potentially interested parties
have been included in the discussion
-14) Using Reported-by:, Tested-by: and Reviewed-by:
+14) Using Reported-by:, Tested-by:, Reviewed-by: and Suggested-by:
If this patch fixes a problem reported by somebody else, consider adding a
Reported-by: tag to credit the reporter for their contribution. Please
@@ -469,6 +468,13 @@ done on the patch. Reviewed-by: tags, when supplied by reviewers known to
understand the subject area and to perform thorough reviews, will normally
increase the likelihood of your patch getting into the kernel.
+A Suggested-by: tag indicates that the patch idea is suggested by the person
+named and ensures credit to the person for the idea. Please note that this
+tag should not be added without the reporter's permission, especially if the
+idea was not posted in a public forum. That said, if we diligently credit our
+idea reporters, they will, hopefully, be inspired to help us again in the
+future.
+
15) The canonical patch format
diff --git a/Documentation/arm/sunxi/clocks.txt b/Documentation/arm/sunxi/clocks.txt
new file mode 100644
index 000000000000..e09a88aa3136
--- /dev/null
+++ b/Documentation/arm/sunxi/clocks.txt
@@ -0,0 +1,56 @@
+Frequently asked questions about the sunxi clock system
+=======================================================
+
+This document contains useful bits of information that people tend to ask
+about the sunxi clock system, as well as accompanying ASCII art when adequate.
+
+Q: Why is the main 24MHz oscillator gatable? Wouldn't that break the
+ system?
+
+A: The 24MHz oscillator allows gating to save power. Indeed, if gated
+ carelessly the system would stop functioning, but with the right
+ steps, one can gate it and keep the system running. Consider this
+ simplified suspend example:
+
+ While the system is operational, you would see something like
+
+ 24MHz 32kHz
+ |
+ PLL1
+ \
+ \_ CPU Mux
+ |
+ [CPU]
+
+ When you are about to suspend, you switch the CPU Mux to the 32kHz
+ oscillator:
+
+ 24Mhz 32kHz
+ | |
+ PLL1 |
+ /
+ CPU Mux _/
+ |
+ [CPU]
+
+ Finally you can gate the main oscillator
+
+ 32kHz
+ |
+ |
+ /
+ CPU Mux _/
+ |
+ [CPU]
+
+Q: Were can I learn more about the sunxi clocks?
+
+A: The linux-sunxi wiki contains a page documenting the clock registers,
+ you can find it at
+
+ http://linux-sunxi.org/A10/CCM
+
+ The authoritative source for information at this time is the ccmu driver
+ released by Allwinner, you can find it at
+
+ https://github.com/linux-sunxi/linux-sunxi/tree/sunxi-3.0/arch/arm/mach-sun4i/clock/ccmu
diff --git a/Documentation/backlight/lp855x-driver.txt b/Documentation/backlight/lp855x-driver.txt
index 18b06ca038ea..1c732f0c6758 100644
--- a/Documentation/backlight/lp855x-driver.txt
+++ b/Documentation/backlight/lp855x-driver.txt
@@ -32,14 +32,10 @@ Platform data for lp855x
For supporting platform specific data, the lp855x platform data can be used.
* name : Backlight driver name. If it is not defined, default name is set.
-* mode : Brightness control mode. PWM or register based.
* device_control : Value of DEVICE CONTROL register.
* initial_brightness : Initial value of backlight brightness.
* period_ns : Platform specific PWM period value. unit is nano.
Only valid when brightness is pwm input mode.
-* load_new_rom_data :
- 0 : use default configuration data
- 1 : update values of eeprom or eprom registers on loading driver
* size_program : Total size of lp855x_rom_data.
* rom_data : List of new eeprom/eprom registers.
@@ -54,10 +50,8 @@ static struct lp855x_rom_data lp8552_eeprom_arr[] = {
static struct lp855x_platform_data lp8552_pdata = {
.name = "lcd-bl",
- .mode = REGISTER_BASED,
.device_control = I2C_CONFIG(LP8552),
.initial_brightness = INITIAL_BRT,
- .load_new_rom_data = 1,
.size_program = ARRAY_SIZE(lp8552_eeprom_arr),
.rom_data = lp8552_eeprom_arr,
};
@@ -65,7 +59,6 @@ static struct lp855x_platform_data lp8552_pdata = {
example 2) lp8556 platform data : pwm input mode with default rom data
static struct lp855x_platform_data lp8556_pdata = {
- .mode = PWM_BASED,
.device_control = PWM_CONFIG(LP8556),
.initial_brightness = INITIAL_BRT,
.period_ns = 1000000,
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index bcf1a00b06a1..638bf17ff869 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -442,7 +442,7 @@ You can attach the current shell task by echoing 0:
You can use the cgroup.procs file instead of the tasks file to move all
threads in a threadgroup at once. Echoing the PID of any task in a
threadgroup to cgroup.procs causes all tasks in that threadgroup to be
-be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks
+attached to the cgroup. Writing 0 to cgroup.procs moves all tasks
in the writing task's threadgroup.
Note: Since every task is always a member of exactly one cgroup in each
@@ -580,6 +580,7 @@ propagation along the hierarchy. See the comment on
cgroup_for_each_descendant_pre() for details.
void css_offline(struct cgroup *cgrp);
+(cgroup_mutex held by caller)
This is the counterpart of css_online() and called iff css_online()
has succeeded on @cgrp. This signifies the beginning of the end of
diff --git a/Documentation/cgroups/devices.txt b/Documentation/cgroups/devices.txt
index 16624a7f8222..3c1095ca02ea 100644
--- a/Documentation/cgroups/devices.txt
+++ b/Documentation/cgroups/devices.txt
@@ -13,9 +13,7 @@ either an integer or * for all. Access is a composition of r
The root device cgroup starts with rwm to 'all'. A child device
cgroup gets a copy of the parent. Administrators can then remove
devices from the whitelist or add new entries. A child cgroup can
-never receive a device access which is denied by its parent. However
-when a device access is removed from a parent it will not also be
-removed from the child(ren).
+never receive a device access which is denied by its parent.
2. User Interface
@@ -50,3 +48,69 @@ task to a new cgroup. (Again we'll probably want to change that).
A cgroup may not be granted more permissions than the cgroup's
parent has.
+
+4. Hierarchy
+
+device cgroups maintain hierarchy by making sure a cgroup never has more
+access permissions than its parent. Every time an entry is written to
+a cgroup's devices.deny file, all its children will have that entry removed
+from their whitelist and all the locally set whitelist entries will be
+re-evaluated. In case one of the locally set whitelist entries would provide
+more access than the cgroup's parent, it'll be removed from the whitelist.
+
+Example:
+ A
+ / \
+ B
+
+ group behavior exceptions
+ A allow "b 8:* rwm", "c 116:1 rw"
+ B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
+
+If a device is denied in group A:
+ # echo "c 116:* r" > A/devices.deny
+it'll propagate down and after revalidating B's entries, the whitelist entry
+"c 116:2 rwm" will be removed:
+
+ group whitelist entries denied devices
+ A all "b 8:* rwm", "c 116:* rw"
+ B "c 1:3 rwm", "b 3:* rwm" all the rest
+
+In case parent's exceptions change and local exceptions are not allowed
+anymore, they'll be deleted.
+
+Notice that new whitelist entries will not be propagated:
+ A
+ / \
+ B
+
+ group whitelist entries denied devices
+ A "c 1:3 rwm", "c 1:5 r" all the rest
+ B "c 1:3 rwm", "c 1:5 r" all the rest
+
+when adding "c *:3 rwm":
+ # echo "c *:3 rwm" >A/devices.allow
+
+the result:
+ group whitelist entries denied devices
+ A "c *:3 rwm", "c 1:5 r" all the rest
+ B "c 1:3 rwm", "c 1:5 r" all the rest
+
+but now it'll be possible to add new entries to B:
+ # echo "c 2:3 rwm" >B/devices.allow
+ # echo "c 50:3 r" >B/devices.allow
+or even
+ # echo "c *:3 rwm" >B/devices.allow
+
+Allowing or denying all by writing 'a' to devices.allow or devices.deny will
+not be possible once the device cgroups has children.
+
+4.1 Hierarchy (internal implementation)
+
+device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
+list of exceptions. The internal state is controlled using the same user
+interface to preserve compatibility with the previous whitelist-only
+implementation. Removal or addition of exceptions that will reduce the access
+to devices will be propagated down the hierarchy.
+For every propagated exception, the effective rules will be re-evaluated based
+on current parent's access rules.
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 8b8c28b9864c..f336ede58e62 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -40,6 +40,7 @@ Features:
- soft limit
- moving (recharging) account at moving a task is selectable.
- usage threshold notifier
+ - memory pressure notifier
- oom-killer disable knob and oom-notifier
- Root cgroup has no limit controls.
@@ -65,6 +66,7 @@ Brief summary of control files.
memory.stat # show various statistics
memory.use_hierarchy # set/show hierarchical account enabled
memory.force_empty # trigger forced move charge to parent
+ memory.pressure_level # set memory pressure notifications
memory.swappiness # set/show swappiness parameter of vmscan
(See sysctl's vm.swappiness)
memory.move_charge_at_immigrate # set/show controls of moving charges
@@ -762,7 +764,73 @@ At reading, current status of OOM is shown.
under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may
be stopped.)
-11. TODO
+11. Memory Pressure
+
+The pressure level notifications can be used to monitor the memory
+allocation cost; based on the pressure, applications can implement
+different strategies of managing their memory resources. The pressure
+levels are defined as following:
+
+The "low" level means that the system is reclaiming memory for new
+allocations. Monitoring this reclaiming activity might be useful for
+maintaining cache level. Upon notification, the program (typically
+"Activity Manager") might analyze vmstat and act in advance (i.e.
+prematurely shutdown unimportant services).
+
+The "medium" level means that the system is experiencing medium memory
+pressure, the system might be making swap, paging out active file caches,
+etc. Upon this event applications may decide to further analyze
+vmstat/zoneinfo/memcg or internal memory usage statistics and free any
+resources that can be easily reconstructed or re-read from a disk.
+
+The "critical" level means that the system is actively thrashing, it is
+about to out of memory (OOM) or even the in-kernel OOM killer is on its
+way to trigger. Applications should do whatever they can to help the
+system. It might be too late to consult with vmstat or any other
+statistics, so it's advisable to take an immediate action.
+
+The events are propagated upward until the event is handled, i.e. the
+events are not pass-through. Here is what this means: for example you have
+three cgroups: A->B->C. Now you set up an event listener on cgroups A, B
+and C, and suppose group C experiences some pressure. In this situation,
+only group C will receive the notification, i.e. groups A and B will not
+receive it. This is done to avoid excessive "broadcasting" of messages,
+which disturbs the system and which is especially bad if we are low on
+memory or thrashing. So, organize the cgroups wisely, or propagate the
+events manually (or, ask us to implement the pass-through events,
+explaining why would you need them.)
+
+The file memory.pressure_level is only used to setup an eventfd. To
+register a notification, an application must:
+
+- create an eventfd using eventfd(2);
+- open memory.pressure_level;
+- write string like "<event_fd> <fd of memory.pressure_level> <level>"
+ to cgroup.event_control.
+
+Application will be notified through eventfd when memory pressure is at
+the specific level (or higher). Read/write operations to
+memory.pressure_level are no implemented.
+
+Test:
+
+ Here is a small script example that makes a new cgroup, sets up a
+ memory limit, sets up a notification in the cgroup and then makes child
+ cgroup experience a critical pressure:
+
+ # cd /sys/fs/cgroup/memory/
+ # mkdir foo
+ # cd foo
+ # cgroup_event_listener memory.pressure_level low &
+ # echo 8000000 > memory.limit_in_bytes
+ # echo 8000000 > memory.memsw.limit_in_bytes
+ # echo $$ > tasks
+ # dd if=/dev/zero | read x
+
+ (Expect a bunch of notifications, and eventually, the oom-killer will
+ trigger.)
+
+12. TODO
1. Add support for accounting huge pages (as a separate controller)
2. Make per-cgroup scanner reclaim not-shared pages first
diff --git a/Documentation/clk.txt b/Documentation/clk.txt
index 1943fae014fd..b9911c27f496 100644
--- a/Documentation/clk.txt
+++ b/Documentation/clk.txt
@@ -174,9 +174,9 @@ int clk_foo_enable(struct clk_hw *hw)
};
Below is a matrix detailing which clk_ops are mandatory based upon the
-hardware capbilities of that clock. A cell marked as "y" means
+hardware capabilities of that clock. A cell marked as "y" means
mandatory, a cell marked as "n" implies that either including that
-callback is invalid or otherwise uneccesary. Empty cells are either
+callback is invalid or otherwise unnecessary. Empty cells are either
optional or must be evaluated on a case-by-case basis.
clock hardware characteristics
@@ -231,3 +231,14 @@ To better enforce this policy, always follow this simple rule: any
statically initialized clock data MUST be defined in a separate file
from the logic that implements its ops. Basically separate the logic
from the data and all is well.
+
+ Part 6 - Disabling clock gating of unused clocks
+
+Sometimes during development it can be useful to be able to bypass the
+default disabling of unused clocks. For example, if drivers aren't enabling
+clocks properly but rely on them being on from the bootloader, bypassing
+the disabling means that the driver will remain functional while the issues
+are sorted out.
+
+To bypass this disabling, include "clk_ignore_unused" in the bootargs to the
+kernel.
diff --git a/Documentation/device-mapper/cache-policies.txt b/Documentation/device-mapper/cache-policies.txt
new file mode 100644
index 000000000000..d7c440b444cc
--- /dev/null
+++ b/Documentation/device-mapper/cache-policies.txt
@@ -0,0 +1,77 @@
+Guidance for writing policies
+=============================
+
+Try to keep transactionality out of it. The core is careful to
+avoid asking about anything that is migrating. This is a pain, but
+makes it easier to write the policies.
+
+Mappings are loaded into the policy at construction time.
+
+Every bio that is mapped by the target is referred to the policy.
+The policy can return a simple HIT or MISS or issue a migration.
+
+Currently there's no way for the policy to issue background work,
+e.g. to start writing back dirty blocks that are going to be evicte
+soon.
+
+Because we map bios, rather than requests it's easy for the policy
+to get fooled by many small bios. For this reason the core target
+issues periodic ticks to the policy. It's suggested that the policy
+doesn't update states (eg, hit counts) for a block more than once
+for each tick. The core ticks by watching bios complete, and so
+trying to see when the io scheduler has let the ios run.
+
+
+Overview of supplied cache replacement policies
+===============================================
+
+multiqueue
+----------
+
+This policy is the default.
+
+The multiqueue policy has two sets of 16 queues: one set for entries
+waiting for the cache and another one for those in the cache.
+Cache entries in the queues are aged based on logical time. Entry into
+the cache is based on variable thresholds and queue selection is based
+on hit count on entry. The policy aims to take different cache miss
+costs into account and to adjust to varying load patterns automatically.
+
+Message and constructor argument pairs are:
+ 'sequential_threshold <#nr_sequential_ios>' and
+ 'random_threshold <#nr_random_ios>'.
+
+The sequential threshold indicates the number of contiguous I/Os
+required before a stream is treated as sequential. The random threshold
+is the number of intervening non-contiguous I/Os that must be seen
+before the stream is treated as random again.
+
+The sequential and random thresholds default to 512 and 4 respectively.
+
+Large, sequential ios are probably better left on the origin device
+since spindles tend to have good bandwidth. The io_tracker counts
+contiguous I/Os to try to spot when the io is in one of these sequential
+modes.
+
+cleaner
+-------
+
+The cleaner writes back all dirty blocks in a cache to decommission it.
+
+Examples
+========
+
+The syntax for a table is:
+ cache <metadata dev> <cache dev> <origin dev> <block size>
+ <#feature_args> [<feature arg>]*
+ <policy> <#policy_args> [<policy arg>]*
+
+The syntax to send a message using the dmsetup command is:
+ dmsetup message <mapped device> 0 sequential_threshold 1024
+ dmsetup message <mapped device> 0 random_threshold 8
+
+Using dmsetup:
+ dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
+ /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
+ creates a 128GB large mapped device named 'blah' with the
+ sequential threshold set to 1024 and the random_threshold set to 8.
diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt
new file mode 100644
index 000000000000..f50470abe241
--- /dev/null
+++ b/Documentation/device-mapper/cache.txt
@@ -0,0 +1,243 @@
+Introduction
+============
+
+dm-cache is a device mapper target written by Joe Thornber, Heinz
+Mauelshagen, and Mike Snitzer.
+
+It aims to improve performance of a block device (eg, a spindle) by
+dynamically migrating some of its data to a faster, smaller device
+(eg, an SSD).
+
+This device-mapper solution allows us to insert this caching at
+different levels of the dm stack, for instance above the data device for
+a thin-provisioning pool. Caching solutions that are integrated more
+closely with the virtual memory system should give better performance.
+
+The target reuses the metadata library used in the thin-provisioning
+library.
+
+The decision as to what data to migrate and when is left to a plug-in
+policy module. Several of these have been written as we experiment,
+and we hope other people will contribute others for specific io
+scenarios (eg. a vm image server).
+
+Glossary
+========
+
+ Migration - Movement of the primary copy of a logical block from one
+ device to the other.
+ Promotion - Migration from slow device to fast device.
+ Demotion - Migration from fast device to slow device.
+
+The origin device always contains a copy of the logical block, which
+may be out of date or kept in sync with the copy on the cache device
+(depending on policy).
+
+Design
+======
+
+Sub-devices
+-----------
+
+The target is constructed by passing three devices to it (along with
+other parameters detailed later):
+
+1. An origin device - the big, slow one.
+
+2. A cache device - the small, fast one.
+
+3. A small metadata device - records which blocks are in the cache,
+ which are dirty, and extra hints for use by the policy object.
+ This information could be put on the cache device, but having it
+ separate allows the volume manager to configure it differently,
+ e.g. as a mirror for extra robustness.
+
+Fixed block size
+----------------
+
+The origin is divided up into blocks of a fixed size. This block size
+is configurable when you first create the cache. Typically we've been
+using block sizes of 256k - 1024k.
+
+Having a fixed block size simplifies the target a lot. But it is
+something of a compromise. For instance, a small part of a block may be
+getting hit a lot, yet the whole block will be promoted to the cache.
+So large block sizes are bad because they waste cache space. And small
+block sizes are bad because they increase the amount of metadata (both
+in core and on disk).
+
+Writeback/writethrough
+----------------------
+
+The cache has two modes, writeback and writethrough.
+
+If writeback, the default, is selected then a write to a block that is
+cached will go only to the cache and the block will be marked dirty in
+the metadata.
+
+If writethrough is selected then a write to a cached block will not
+complete until it has hit both the origin and cache devices. Clean
+blocks should remain clean.
+
+A simple cleaner policy is provided, which will clean (write back) all
+dirty blocks in a cache. Useful for decommissioning a cache.
+
+Migration throttling
+--------------------
+
+Migrating data between the origin and cache device uses bandwidth.
+The user can set a throttle to prevent more than a certain amount of
+migration occuring at any one time. Currently we're not taking any
+account of normal io traffic going to the devices. More work needs
+doing here to avoid migrating during those peak io moments.
+
+For the time being, a message "migration_threshold <#sectors>"
+can be used to set the maximum number of sectors being migrated,
+the default being 204800 sectors (or 100MB).
+
+Updating on-disk metadata
+-------------------------
+
+On-disk metadata is committed every time a REQ_SYNC or REQ_FUA bio is
+written. If no such requests are made then commits will occur every
+second. This means the cache behaves like a physical disk that has a
+write cache (the same is true of the thin-provisioning target). If
+power is lost you may lose some recent writes. The metadata should
+always be consistent in spite of any crash.
+
+The 'dirty' state for a cache block changes far too frequently for us
+to keep updating it on the fly. So we treat it as a hint. In normal
+operation it will be written when the dm device is suspended. If the
+system crashes all cache blocks will be assumed dirty when restarted.
+
+Per-block policy hints
+----------------------
+
+Policy plug-ins can store a chunk of data per cache block. It's up to
+the policy how big this chunk is, but it should be kept small. Like the
+dirty flags this data is lost if there's a crash so a safe fallback
+value should always be possible.
+
+For instance, the 'mq' policy, which is currently the default policy,
+uses this facility to store the hit count of the cache blocks. If
+there's a crash this information will be lost, which means the cache
+may be less efficient until those hit counts are regenerated.
+
+Policy hints affect performance, not correctness.
+
+Policy messaging
+----------------
+
+Policies will have different tunables, specific to each one, so we
+need a generic way of getting and setting these. Device-mapper
+messages are used. Refer to cache-policies.txt.
+
+Discard bitset resolution
+-------------------------
+
+We can avoid copying data during migration if we know the block has
+been discarded. A prime example of this is when mkfs discards the
+whole block device. We store a bitset tracking the discard state of
+blocks. However, we allow this bitset to have a different block size
+from the cache blocks. This is because we need to track the discard
+state for all of the origin device (compare with the dirty bitset
+which is just for the smaller cache device).
+
+Target interface
+================
+
+Constructor
+-----------
+
+ cache <metadata dev> <cache dev> <origin dev> <block size>
+ <#feature args> [<feature arg>]*
+ <policy> <#policy args> [policy args]*
+
+ metadata dev : fast device holding the persistent metadata
+ cache dev : fast device holding cached data blocks
+ origin dev : slow device holding original data blocks
+ block size : cache unit size in sectors
+
+ #feature args : number of feature arguments passed
+ feature args : writethrough. (The default is writeback.)
+
+ policy : the replacement policy to use
+ #policy args : an even number of arguments corresponding to
+ key/value pairs passed to the policy
+ policy args : key/value pairs passed to the policy
+ E.g. 'sequential_threshold 1024'
+ See cache-policies.txt for details.
+
+Optional feature arguments are:
+ writethrough : write through caching that prohibits cache block
+ content from being different from origin block content.
+ Without this argument, the default behaviour is to write
+ back cache block contents later for performance reasons,
+ so they may differ from the corresponding origin blocks.
+
+A policy called 'default' is always registered. This is an alias for
+the policy we currently think is giving best all round performance.
+
+As the default policy could vary between kernels, if you are relying on
+the characteristics of a specific policy, always request it by name.
+
+Status
+------
+
+<#used metadata blocks>/<#total metadata blocks> <#read hits> <#read misses>
+<#write hits> <#write misses> <#demotions> <#promotions> <#blocks in cache>
+<#dirty> <#features> <features>* <#core args> <core args>* <#policy args>
+<policy args>*
+
+#used metadata blocks : Number of metadata blocks used
+#total metadata blocks : Total number of metadata blocks
+#read hits : Number of times a READ bio has been mapped
+ to the cache
+#read misses : Number of times a READ bio has been mapped
+ to the origin
+#write hits : Number of times a WRITE bio has been mapped
+ to the cache
+#write misses : Number of times a WRITE bio has been
+ mapped to the origin
+#demotions : Number of times a block has been removed
+ from the cache
+#promotions : Number of times a block has been moved to
+ the cache
+#blocks in cache : Number of blocks resident in the cache
+#dirty : Number of blocks in the cache that differ
+ from the origin
+#feature args : Number of feature args to follow
+feature args : 'writethrough' (optional)
+#core args : Number of core arguments (must be even)
+core args : Key/value pairs for tuning the core
+ e.g. migration_threshold
+#policy args : Number of policy arguments to follow (must be even)
+policy args : Key/value pairs
+ e.g. 'sequential_threshold 1024
+
+Messages
+--------
+
+Policies will have different tunables, specific to each one, so we
+need a generic way of getting and setting these. Device-mapper
+messages are used. (A sysfs interface would also be possible.)
+
+The message format is:
+
+ <key> <value>
+
+E.g.
+ dmsetup message my_cache 0 sequential_threshold 1024
+
+Examples
+========
+
+The test suite can be found here:
+
+https://github.com/jthornber/thinp-test-suite
+
+dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
+ /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
+dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
+ /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
+ mq 4 sequential_threshold 1024 random_threshold 8'
diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.txt
index 56fb62b09fc5..b428556197c9 100644
--- a/Documentation/device-mapper/dm-raid.txt
+++ b/Documentation/device-mapper/dm-raid.txt
@@ -30,6 +30,7 @@ The target is named "raid" and it accepts the following parameters:
raid10 Various RAID10 inspired algorithms chosen by additional params
- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
- RAID1E: Integrated Adjacent Stripe Mirroring
+ - RAID1E: Integrated Offset Stripe Mirroring
- and other similar RAID10 variants
Reference: Chapter 4 of
@@ -64,15 +65,15 @@ The target is named "raid" and it accepts the following parameters:
synchronisation state for each region.
[raid10_copies <# copies>]
- [raid10_format near]
+ [raid10_format <near|far|offset>]
These two options are used to alter the default layout of
a RAID10 configuration. The number of copies is can be
- specified, but the default is 2. There are other variations
- to how the copies are laid down - the default and only current
- option is "near". Near copies are what most people think of
- with respect to mirroring. If these options are left
- unspecified, or 'raid10_copies 2' and/or 'raid10_format near'
- are given, then the layouts for 2, 3 and 4 devices are:
+ specified, but the default is 2. There are also three
+ variations to how the copies are laid down - the default
+ is "near". Near copies are what most people think of with
+ respect to mirroring. If these options are left unspecified,
+ or 'raid10_copies 2' and/or 'raid10_format near' are given,
+ then the layouts for 2, 3 and 4 devices are:
2 drives 3 drives 4 drives
-------- ---------- --------------
A1 A1 A1 A1 A2 A1 A1 A2 A2
@@ -85,6 +86,33 @@ The target is named "raid" and it accepts the following parameters:
3-device layout is what might be called a 'RAID1E - Integrated
Adjacent Stripe Mirroring'.
+ If 'raid10_copies 2' and 'raid10_format far', then the layouts
+ for 2, 3 and 4 devices are:
+ 2 drives 3 drives 4 drives
+ -------- -------------- --------------------
+ A1 A2 A1 A2 A3 A1 A2 A3 A4
+ A3 A4 A4 A5 A6 A5 A6 A7 A8
+ A5 A6 A7 A8 A9 A9 A10 A11 A12
+ .. .. .. .. .. .. .. .. ..
+ A2 A1 A3 A1 A2 A2 A1 A4 A3
+ A4 A3 A6 A4 A5 A6 A5 A8 A7
+ A6 A5 A9 A7 A8 A10 A9 A12 A11
+ .. .. .. .. .. .. .. .. ..
+
+ If 'raid10_copies 2' and 'raid10_format offset', then the
+ layouts for 2, 3 and 4 devices are:
+ 2 drives 3 drives 4 drives
+ -------- ------------ -----------------
+ A1 A2 A1 A2 A3 A1 A2 A3 A4
+ A2 A1 A3 A1 A2 A2 A1 A4 A3
+ A3 A4 A4 A5 A6 A5 A6 A7 A8
+ A4 A3 A6 A4 A5 A6 A5 A8 A7
+ A5 A6 A7 A8 A9 A9 A10 A11 A12
+ A6 A5 A9 A7 A8 A10 A9 A12 A11
+ .. .. .. .. .. .. .. .. ..
+ Here we see layouts closely akin to 'RAID1E - Integrated
+ Offset Stripe Mirroring'.
+
<#raid_devs>: The number of devices composing the array.
Each device consists of two entries. The first is the device
containing the metadata (if any); the second is the one containing the
@@ -142,3 +170,5 @@ Version History
1.3.0 Added support for RAID 10
1.3.1 Allow device replacement/rebuild for RAID 10
1.3.2 Fix/improve redundancy checking for RAID10
+1.4.0 Non-functional change. Removes arg from mapping function.
+1.4.1 Add RAID10 "far" and "offset" algorithm support.
diff --git a/Documentation/devicetree/bindings/arc/interrupts.txt b/Documentation/devicetree/bindings/arc/interrupts.txt
new file mode 100644
index 000000000000..9a5d562435ea
--- /dev/null
+++ b/Documentation/devicetree/bindings/arc/interrupts.txt
@@ -0,0 +1,24 @@
+* ARC700 incore Interrupt Controller
+
+ The core interrupt controller provides 32 prioritised interrupts (2 levels)
+ to ARC700 core.
+
+Properties:
+
+- compatible: "snps,arc700-intc"
+- interrupt-controller: This is an interrupt controller.
+- #interrupt-cells: Must be <1>.
+
+ Single Cell "interrupts" property of a device specifies the IRQ number
+ between 0 to 31
+
+ intc accessed via the special ARC AUX register interface, hence "reg" property
+ is not specified.
+
+Example:
+
+ intc: interrupt-controller {
+ compatible = "snps,arc700-intc";
+ interrupt-controller;
+ #interrupt-cells = <1>;
+ };
diff --git a/Documentation/devicetree/bindings/arm/armadeus.txt b/Documentation/devicetree/bindings/arm/armadeus.txt
new file mode 100644
index 000000000000..9821283ff516
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/armadeus.txt
@@ -0,0 +1,6 @@
+Armadeus i.MX Platforms Device Tree Bindings
+-----------------------------------------------
+
+APF51: i.MX51 based module.
+Required root node properties:
+ - compatible = "armadeus,imx51-apf51", "fsl,imx51";
diff --git a/Documentation/devicetree/bindings/arm/atmel-adc.txt b/Documentation/devicetree/bindings/arm/atmel-adc.txt
index c63097d6afeb..16769d9cedd6 100644
--- a/Documentation/devicetree/bindings/arm/atmel-adc.txt
+++ b/Documentation/devicetree/bindings/arm/atmel-adc.txt
@@ -14,9 +14,19 @@ Required properties:
- atmel,adc-status-register: Offset of the Interrupt Status Register
- atmel,adc-trigger-register: Offset of the Trigger Register
- atmel,adc-vref: Reference voltage in millivolts for the conversions
+ - atmel,adc-res: List of resolution in bits supported by the ADC. List size
+ must be two at least.
+ - atmel,adc-res-names: Contains one identifier string for each resolution
+ in atmel,adc-res property. "lowres" and "highres"
+ identifiers are required.
Optional properties:
- atmel,adc-use-external: Boolean to enable of external triggers
+ - atmel,adc-use-res: String corresponding to an identifier from
+ atmel,adc-res-names property. If not specified, the highest
+ resolution will be used.
+ - atmel,adc-sleep-mode: Boolean to enable sleep mode when no conversion
+ - atmel,adc-sample-hold-time: Sample and Hold Time in microseconds
Optional trigger Nodes:
- Required properties:
@@ -40,6 +50,9 @@ adc0: adc@fffb0000 {
atmel,adc-trigger-register = <0x08>;
atmel,adc-use-external;
atmel,adc-vref = <3300>;
+ atmel,adc-res = <8 10>;
+ atmel,adc-res-names = "lowres", "highres";
+ atmel,adc-use-res = "lowres";
trigger@0 {
trigger-name = "external-rising";
diff --git a/Documentation/devicetree/bindings/arm/fsl.txt b/Documentation/devicetree/bindings/arm/fsl.txt
index f79818711e83..e935d7d4ac43 100644
--- a/Documentation/devicetree/bindings/arm/fsl.txt
+++ b/Documentation/devicetree/bindings/arm/fsl.txt
@@ -5,6 +5,14 @@ i.MX23 Evaluation Kit
Required root node properties:
- compatible = "fsl,imx23-evk", "fsl,imx23";
+i.MX25 Product Development Kit
+Required root node properties:
+ - compatible = "fsl,imx25-pdk", "fsl,imx25";
+
+i.MX27 Product Development Kit
+Required root node properties:
+ - compatible = "fsl,imx27-pdk", "fsl,imx27";
+
i.MX28 Evaluation Kit
Required root node properties:
- compatible = "fsl,imx28-evk", "fsl,imx28";
diff --git a/Documentation/devicetree/bindings/arm/msm/ssbi.txt b/Documentation/devicetree/bindings/arm/msm/ssbi.txt
new file mode 100644
index 000000000000..54fd5ced3401
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/msm/ssbi.txt
@@ -0,0 +1,18 @@
+* Qualcomm SSBI
+
+Some Qualcomm MSM devices contain a point-to-point serial bus used to
+communicate with a limited range of devices (mostly power management
+chips).
+
+These require the following properties:
+
+- compatible: "qcom,ssbi"
+
+- qcom,controller-type
+ indicates the SSBI bus variant the controller should use to talk
+ with the slave device. This should be one of "ssbi", "ssbi2", or
+ "pmic-arbiter". The type chosen is determined by the attached
+ slave.
+
+The slave device should be the single child node of the ssbi device
+with a compatible field.
diff --git a/Documentation/devicetree/bindings/arm/samsung/exynos-adc.txt b/Documentation/devicetree/bindings/arm/samsung/exynos-adc.txt
new file mode 100644
index 000000000000..47ada1dff216
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/samsung/exynos-adc.txt
@@ -0,0 +1,60 @@
+Samsung Exynos Analog to Digital Converter bindings
+
+The devicetree bindings are for the new ADC driver written for
+Exynos4 and upward SoCs from Samsung.
+
+New driver handles the following
+1. Supports ADC IF found on EXYNOS4412/EXYNOS5250
+ and future SoCs from Samsung
+2. Add ADC driver under iio/adc framework
+3. Also adds the Documentation for device tree bindings
+
+Required properties:
+- compatible: Must be "samsung,exynos-adc-v1"
+ for exynos4412/5250 controllers.
+ Must be "samsung,exynos-adc-v2" for
+ future controllers.
+- reg: Contains ADC register address range (base address and
+ length) and the address of the phy enable register.
+- interrupts: Contains the interrupt information for the timer. The
+ format is being dependent on which interrupt controller
+ the Samsung device uses.
+- #io-channel-cells = <1>; As ADC has multiple outputs
+- clocks From common clock binding: handle to adc clock.
+- clock-names From common clock binding: Shall be "adc".
+- vdd-supply VDD input supply.
+
+Note: child nodes can be added for auto probing from device tree.
+
+Example: adding device info in dtsi file
+
+adc: adc@12D10000 {
+ compatible = "samsung,exynos-adc-v1";
+ reg = <0x12D10000 0x100>, <0x10040718 0x4>;
+ interrupts = <0 106 0>;
+ #io-channel-cells = <1>;
+ io-channel-ranges;
+
+ clocks = <&clock 303>;
+ clock-names = "adc";
+
+ vdd-supply = <&buck5_reg>;
+};
+
+
+Example: Adding child nodes in dts file
+
+adc@12D10000 {
+
+ /* NTC thermistor is a hwmon device */
+ ncp15wb473@0 {
+ compatible = "ntc,ncp15wb473";
+ pullup-uV = <1800000>;
+ pullup-ohm = <47000>;
+ pulldown-ohm = <0>;
+ io-channels = <&adc 4>;
+ };
+};
+
+Note: Does not apply to ADC driver under arch/arm/plat-samsung/
+Note: The child node can be added under the adc node or separately.
diff --git a/Documentation/devicetree/bindings/clock/axi-clkgen.txt b/Documentation/devicetree/bindings/clock/axi-clkgen.txt
new file mode 100644
index 000000000000..028b493e97ff
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/axi-clkgen.txt
@@ -0,0 +1,22 @@
+Binding for the axi-clkgen clock generator
+
+This binding uses the common clock binding[1].
+
+[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
+
+Required properties:
+- compatible : shall be "adi,axi-clkgen".
+- #clock-cells : from common clock binding; Should always be set to 0.
+- reg : Address and length of the axi-clkgen register set.
+- clocks : Phandle and clock specifier for the parent clock.
+
+Optional properties:
+- clock-output-names : From common clock binding.
+
+Example:
+ clock@0xff000000 {
+ compatible = "adi,axi-clkgen";
+ #clock-cells = <0>;
+ reg = <0xff000000 0x1000>;
+ clocks = <&osc 1>;
+ };
diff --git a/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
new file mode 100644
index 000000000000..5757f9abfc26
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
@@ -0,0 +1,24 @@
+Binding for simple fixed factor rate clock sources.
+
+This binding uses the common clock binding[1].
+
+[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
+
+Required properties:
+- compatible : shall be "fixed-factor-clock".
+- #clock-cells : from common clock binding; shall be set to 0.
+- clock-div: fixed divider.
+- clock-mult: fixed multiplier.
+- clocks: parent clock.
+
+Optional properties:
+- clock-output-names : From common clock binding.
+
+Example:
+ clock {
+ compatible = "fixed-factor-clock";
+ clocks = <&parentclk>;
+ #clock-cells = <0>;
+ div = <2>;
+ mult = <1>;
+ };
diff --git a/Documentation/devicetree/bindings/clock/imx5-clock.txt b/Documentation/devicetree/bindings/clock/imx5-clock.txt
index 04ad47876be0..2a0c904c46ae 100644
--- a/Documentation/devicetree/bindings/clock/imx5-clock.txt
+++ b/Documentation/devicetree/bindings/clock/imx5-clock.txt
@@ -171,6 +171,7 @@ clocks and IDs.
can_sel 156
can1_serial_gate 157
can1_ipg_gate 158
+ owire_gate 159
Examples (for mx53):
diff --git a/Documentation/devicetree/bindings/clock/imx6q-clock.txt b/Documentation/devicetree/bindings/clock/imx6q-clock.txt
index f73fdf595568..969b38e06ad3 100644
--- a/Documentation/devicetree/bindings/clock/imx6q-clock.txt
+++ b/Documentation/devicetree/bindings/clock/imx6q-clock.txt
@@ -203,6 +203,8 @@ clocks and IDs.
pcie_ref 188
pcie_ref_125m 189
enet_ref 190
+ usbphy1_gate 191
+ usbphy2_gate 192
Examples:
diff --git a/Documentation/devicetree/bindings/clock/silabs,si5351.txt b/Documentation/devicetree/bindings/clock/silabs,si5351.txt
new file mode 100644
index 000000000000..cc374651662c
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/silabs,si5351.txt
@@ -0,0 +1,114 @@
+Binding for Silicon Labs Si5351a/b/c programmable i2c clock generator.
+
+Reference
+[1] Si5351A/B/C Data Sheet
+ http://www.silabs.com/Support%20Documents/TechnicalDocs/Si5351.pdf
+
+The Si5351a/b/c are programmable i2c clock generators with upto 8 output
+clocks. Si5351a also has a reduced pin-count package (MSOP10) where only
+3 output clocks are accessible. The internal structure of the clock
+generators can be found in [1].
+
+==I2C device node==
+
+Required properties:
+- compatible: shall be one of "silabs,si5351{a,a-msop,b,c}".
+- reg: i2c device address, shall be 0x60 or 0x61.
+- #clock-cells: from common clock binding; shall be set to 1.
+- clocks: from common clock binding; list of parent clock
+ handles, shall be xtal reference clock or xtal and clkin for
+ si5351c only.
+- #address-cells: shall be set to 1.
+- #size-cells: shall be set to 0.
+
+Optional properties:
+- silabs,pll-source: pair of (number, source) for each pll. Allows
+ to overwrite clock source of pll A (number=0) or B (number=1).
+
+==Child nodes==
+
+Each of the clock outputs can be overwritten individually by
+using a child node to the I2C device node. If a child node for a clock
+output is not set, the eeprom configuration is not overwritten.
+
+Required child node properties:
+- reg: number of clock output.
+
+Optional child node properties:
+- silabs,clock-source: source clock of the output divider stage N, shall be
+ 0 = multisynth N
+ 1 = multisynth 0 for output clocks 0-3, else multisynth4
+ 2 = xtal
+ 3 = clkin (si5351c only)
+- silabs,drive-strength: output drive strength in mA, shall be one of {2,4,6,8}.
+- silabs,multisynth-source: source pll A(0) or B(1) of corresponding multisynth
+ divider.
+- silabs,pll-master: boolean, multisynth can change pll frequency.
+
+==Example==
+
+/* 25MHz reference crystal */
+ref25: ref25M {
+ compatible = "fixed-clock";
+ #clock-cells = <0>;
+ clock-frequency = <25000000>;
+};
+
+i2c-master-node {
+
+ /* Si5351a msop10 i2c clock generator */
+ si5351a: clock-generator@60 {
+ compatible = "silabs,si5351a-msop";
+ reg = <0x60>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ #clock-cells = <1>;
+
+ /* connect xtal input to 25MHz reference */
+ clocks = <&ref25>;
+
+ /* connect xtal input as source of pll0 and pll1 */
+ silabs,pll-source = <0 0>, <1 0>;
+
+ /*
+ * overwrite clkout0 configuration with:
+ * - 8mA output drive strength
+ * - pll0 as clock source of multisynth0
+ * - multisynth0 as clock source of output divider
+ * - multisynth0 can change pll0
+ * - set initial clock frequency of 74.25MHz
+ */
+ clkout0 {
+ reg = <0>;
+ silabs,drive-strength = <8>;
+ silabs,multisynth-source = <0>;
+ silabs,clock-source = <0>;
+ silabs,pll-master;
+ clock-frequency = <74250000>;
+ };
+
+ /*
+ * overwrite clkout1 configuration with:
+ * - 4mA output drive strength
+ * - pll1 as clock source of multisynth1
+ * - multisynth1 as clock source of output divider
+ * - multisynth1 can change pll1
+ */
+ clkout1 {
+ reg = <1>;
+ silabs,drive-strength = <4>;
+ silabs,multisynth-source = <1>;
+ silabs,clock-source = <0>;
+ pll-master;
+ };
+
+ /*
+ * overwrite clkout2 configuration with:
+ * - xtal as clock source of output divider
+ */
+ clkout2 {
+ reg = <2>;
+ silabs,clock-source = <2>;
+ };
+ };
+};
diff --git a/Documentation/devicetree/bindings/clock/sunxi.txt b/Documentation/devicetree/bindings/clock/sunxi.txt
new file mode 100644
index 000000000000..729f52426fe1
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/sunxi.txt
@@ -0,0 +1,151 @@
+Device Tree Clock bindings for arch-sunxi
+
+This binding uses the common clock binding[1].
+
+[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
+
+Required properties:
+- compatible : shall be one of the following:
+ "allwinner,sun4i-osc-clk" - for a gatable oscillator
+ "allwinner,sun4i-pll1-clk" - for the main PLL clock
+ "allwinner,sun4i-cpu-clk" - for the CPU multiplexer clock
+ "allwinner,sun4i-axi-clk" - for the AXI clock
+ "allwinner,sun4i-axi-gates-clk" - for the AXI gates
+ "allwinner,sun4i-ahb-clk" - for the AHB clock
+ "allwinner,sun4i-ahb-gates-clk" - for the AHB gates
+ "allwinner,sun4i-apb0-clk" - for the APB0 clock
+ "allwinner,sun4i-apb0-gates-clk" - for the APB0 gates
+ "allwinner,sun4i-apb1-clk" - for the APB1 clock
+ "allwinner,sun4i-apb1-mux-clk" - for the APB1 clock muxing
+ "allwinner,sun4i-apb1-gates-clk" - for the APB1 gates
+
+Required properties for all clocks:
+- reg : shall be the control register address for the clock.
+- clocks : shall be the input parent clock(s) phandle for the clock
+- #clock-cells : from common clock binding; shall be set to 0 except for
+ "allwinner,sun4i-*-gates-clk" where it shall be set to 1
+
+Additionally, "allwinner,sun4i-*-gates-clk" clocks require:
+- clock-output-names : the corresponding gate names that the clock controls
+
+For example:
+
+osc24M: osc24M@01c20050 {
+ #clock-cells = <0>;
+ compatible = "allwinner,sun4i-osc-clk";
+ reg = <0x01c20050 0x4>;
+ clocks = <&osc24M_fixed>;
+};
+
+pll1: pll1@01c20000 {
+ #clock-cells = <0>;
+ compatible = "allwinner,sun4i-pll1-clk";
+ reg = <0x01c20000 0x4>;
+ clocks = <&osc24M>;
+};
+
+cpu: cpu@01c20054 {
+ #clock-cells = <0>;
+ compatible = "allwinner,sun4i-cpu-clk";
+ reg = <0x01c20054 0x4>;
+ clocks = <&osc32k>, <&osc24M>, <&pll1>;
+};
+
+
+
+Gate clock outputs
+
+The "allwinner,sun4i-*-gates-clk" clocks provide several gatable outputs;
+their corresponding offsets as present on sun4i are listed below. Note that
+some of these gates are not present on sun5i.
+
+ * AXI gates ("allwinner,sun4i-axi-gates-clk")
+
+ DRAM 0
+
+ * AHB gates ("allwinner,sun4i-ahb-gates-clk")
+
+ USB0 0
+ EHCI0 1
+ OHCI0 2*
+ EHCI1 3
+ OHCI1 4*
+ SS 5
+ DMA 6
+ BIST 7
+ MMC0 8
+ MMC1 9
+ MMC2 10
+ MMC3 11
+ MS 12**
+ NAND 13
+ SDRAM 14
+
+ ACE 16
+ EMAC 17
+ TS 18
+
+ SPI0 20
+ SPI1 21
+ SPI2 22
+ SPI3 23
+ PATA 24
+ SATA 25**
+ GPS 26*
+
+ VE 32
+ TVD 33
+ TVE0 34
+ TVE1 35
+ LCD0 36
+ LCD1 37
+
+ CSI0 40
+ CSI1 41
+
+ HDMI 43
+ DE_BE0 44
+ DE_BE1 45
+ DE_FE0 46
+ DE_FE1 47
+
+ MP 50
+
+ MALI400 52
+
+ * APB0 gates ("allwinner,sun4i-apb0-gates-clk")
+
+ CODEC 0
+ SPDIF 1*
+ AC97 2
+ IIS 3
+
+ PIO 5
+ IR0 6
+ IR1 7
+
+ KEYPAD 10
+
+ * APB1 gates ("allwinner,sun4i-apb1-gates-clk")
+
+ I2C0 0
+ I2C1 1
+ I2C2 2
+
+ CAN 4
+ SCR 5
+ PS20 6
+ PS21 7
+
+ UART0 16
+ UART1 17
+ UART2 18
+ UART3 19
+ UART4 20
+ UART5 21
+ UART6 22
+ UART7 23
+
+Notation:
+ [*]: The datasheet didn't mention these, but they are present on AW code
+ [**]: The datasheet had this marked as "NC" but they are used on AW code
diff --git a/Documentation/devicetree/bindings/dma/snps-dma.txt b/Documentation/devicetree/bindings/dma/snps-dma.txt
index 5bb3dfb6f1d8..d58675ea1abf 100644
--- a/Documentation/devicetree/bindings/dma/snps-dma.txt
+++ b/Documentation/devicetree/bindings/dma/snps-dma.txt
@@ -3,59 +3,61 @@
Required properties:
- compatible: "snps,dma-spear1340"
- reg: Address range of the DMAC registers
-- interrupt-parent: Should be the phandle for the interrupt controller
- that services interrupts for this device
- interrupt: Should contain the DMAC interrupt number
-- nr_channels: Number of channels supported by hardware
-- is_private: The device channels should be marked as private and not for by the
- general purpose DMA channel allocator. False if not passed.
+- dma-channels: Number of channels supported by hardware
+- dma-requests: Number of DMA request lines supported, up to 16
+- dma-masters: Number of AHB masters supported by the controller
+- #dma-cells: must be <3>
- chan_allocation_order: order of allocation of channel, 0 (default): ascending,
1: descending
- chan_priority: priority of channels. 0 (default): increase from chan 0->n, 1:
increase from chan n->0
- block_size: Maximum block size supported by the controller
-- nr_masters: Number of AHB masters supported by the controller
- data_width: Maximum data width supported by hardware per AHB master
(0 - 8bits, 1 - 16bits, ..., 5 - 256bits)
-- slave_info:
- - bus_id: name of this device channel, not just a device name since
- devices may have more than one channel e.g. "foo_tx". For using the
- dw_generic_filter(), slave drivers must pass exactly this string as
- param to filter function.
- - cfg_hi: Platform-specific initializer for the CFG_HI register
- - cfg_lo: Platform-specific initializer for the CFG_LO register
- - src_master: src master for transfers on allocated channel.
- - dst_master: dest master for transfers on allocated channel.
+
+
+Optional properties:
+- interrupt-parent: Should be the phandle for the interrupt controller
+ that services interrupts for this device
+- is_private: The device channels should be marked as private and not for by the
+ general purpose DMA channel allocator. False if not passed.
Example:
- dma@fc000000 {
+ dmahost: dma@fc000000 {
compatible = "snps,dma-spear1340";
reg = <0xfc000000 0x1000>;
interrupt-parent = <&vic1>;
interrupts = <12>;
- nr_channels = <8>;
+ dma-channels = <8>;
+ dma-requests = <16>;
+ dma-masters = <2>;
+ #dma-cells = <3>;
chan_allocation_order = <1>;
chan_priority = <1>;
block_size = <0xfff>;
- nr_masters = <2>;
data_width = <3 3 0 0>;
+ };
- slave_info {
- uart0-tx {
- bus_id = "uart0-tx";
- cfg_hi = <0x4000>; /* 0x8 << 11 */
- cfg_lo = <0>;
- src_master = <0>;
- dst_master = <1>;
- };
- spi0-tx {
- bus_id = "spi0-tx";
- cfg_hi = <0x2000>; /* 0x4 << 11 */
- cfg_lo = <0>;
- src_master = <0>;
- dst_master = <0>;
- };
- };
+DMA clients connected to the Designware DMA controller must use the format
+described in the dma.txt file, using a four-cell specifier for each channel.
+The four cells in order are:
+
+1. A phandle pointing to the DMA controller
+2. The DMA request line number
+3. Source master for transfers on allocated channel
+4. Destination master for transfers on allocated channel
+
+Example:
+
+ serial@e0000000 {
+ compatible = "arm,pl011", "arm,primecell";
+ reg = <0xe0000000 0x1000>;
+ interrupts = <0 35 0x4>;
+ status = "disabled";
+ dmas = <&dmahost 12 0 1>,
+ <&dmahost 13 0 1 0>;
+ dma-names = "rx", "rx";
};
diff --git a/Documentation/devicetree/bindings/gpio/gpio.txt b/Documentation/devicetree/bindings/gpio/gpio.txt
index a33628759d36..d933af370697 100644
--- a/Documentation/devicetree/bindings/gpio/gpio.txt
+++ b/Documentation/devicetree/bindings/gpio/gpio.txt
@@ -98,7 +98,7 @@ announce the pinrange to the pin ctrl subsystem. For example,
compatible = "fsl,qe-pario-bank-e", "fsl,qe-pario-bank";
reg = <0x1460 0x18>;
gpio-controller;
- gpio-ranges = <&pinctrl1 20 10>, <&pinctrl2 50 20>;
+ gpio-ranges = <&pinctrl1 0 20 10>, <&pinctrl2 10 50 20>;
}
@@ -107,8 +107,8 @@ where,
Next values specify the base pin and number of pins for the range
handled by 'qe_pio_e' gpio. In the given example from base pin 20 to
- pin 29 under pinctrl1 and pin 50 to pin 69 under pinctrl2 is handled
- by this gpio controller.
+ pin 29 under pinctrl1 with gpio offset 0 and pin 50 to pin 69 under
+ pinctrl2 with gpio offset 10 is handled by this gpio controller.
The pinctrl node must have "#gpio-range-cells" property to show number of
arguments to pass with phandle from gpio controllers node.
diff --git a/Documentation/devicetree/bindings/hwmon/ntc_thermistor.txt b/Documentation/devicetree/bindings/hwmon/ntc_thermistor.txt
new file mode 100644
index 000000000000..c6f66674f19c
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/ntc_thermistor.txt
@@ -0,0 +1,29 @@
+NTC Thermistor hwmon sensors
+-------------------------------
+
+Requires node properties:
+- "compatible" value : one of
+ "ntc,ncp15wb473"
+ "ntc,ncp18wb473"
+ "ntc,ncp21wb473"
+ "ntc,ncp03wb473"
+ "ntc,ncp15wl333"
+- "pullup-uv" Pull up voltage in micro volts
+- "pullup-ohm" Pull up resistor value in ohms
+- "pulldown-ohm" Pull down resistor value in ohms
+- "connected-positive" Always ON, If not specified.
+ Status change is possible.
+- "io-channels" Channel node of ADC to be used for
+ conversion.
+
+Read more about iio bindings at
+ Documentation/devicetree/bindings/iio/iio-bindings.txt
+
+Example:
+ ncp15wb473@0 {
+ compatible = "ntc,ncp15wb473";
+ pullup-uv = <1800000>;
+ pullup-ohm = <47000>;
+ pulldown-ohm = <0>;
+ io-channels = <&adc 3>;
+ };
diff --git a/Documentation/devicetree/bindings/iio/iio-bindings.txt b/Documentation/devicetree/bindings/iio/iio-bindings.txt
new file mode 100644
index 000000000000..0b447d9ad196
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/iio-bindings.txt
@@ -0,0 +1,97 @@
+This binding is derived from clock bindings, and based on suggestions
+from Lars-Peter Clausen [1].
+
+Sources of IIO channels can be represented by any node in the device
+tree. Those nodes are designated as IIO providers. IIO consumer
+nodes use a phandle and IIO specifier pair to connect IIO provider
+outputs to IIO inputs. Similar to the gpio specifiers, an IIO
+specifier is an array of one or more cells identifying the IIO
+output on a device. The length of an IIO specifier is defined by the
+value of a #io-channel-cells property in the IIO provider node.
+
+[1] http://marc.info/?l=linux-iio&m=135902119507483&w=2
+
+==IIO providers==
+
+Required properties:
+#io-channel-cells: Number of cells in an IIO specifier; Typically 0 for nodes
+ with a single IIO output and 1 for nodes with multiple
+ IIO outputs.
+
+Example for a simple configuration with no trigger:
+
+ adc: voltage-sensor@35 {
+ compatible = "maxim,max1139";
+ reg = <0x35>;
+ #io-channel-cells = <1>;
+ };
+
+Example for a configuration with trigger:
+
+ adc@35 {
+ compatible = "some-vendor,some-adc";
+ reg = <0x35>;
+
+ adc1: iio-device@0 {
+ #io-channel-cells = <1>;
+ /* other properties */
+ };
+ adc2: iio-device@1 {
+ #io-channel-cells = <1>;
+ /* other properties */
+ };
+ };
+
+==IIO consumers==
+
+Required properties:
+io-channels: List of phandle and IIO specifier pairs, one pair
+ for each IIO input to the device. Note: if the
+ IIO provider specifies '0' for #io-channel-cells,
+ then only the phandle portion of the pair will appear.
+
+Optional properties:
+io-channel-names:
+ List of IIO input name strings sorted in the same
+ order as the io-channels property. Consumers drivers
+ will use io-channel-names to match IIO input names
+ with IIO specifiers.
+io-channel-ranges:
+ Empty property indicating that child nodes can inherit named
+ IIO channels from this node. Useful for bus nodes to provide
+ and IIO channel to their children.
+
+For example:
+
+ device {
+ io-channels = <&adc 1>, <&ref 0>;
+ io-channel-names = "vcc", "vdd";
+ };
+
+This represents a device with two IIO inputs, named "vcc" and "vdd".
+The vcc channel is connected to output 1 of the &adc device, and the
+vdd channel is connected to output 0 of the &ref device.
+
+==Example==
+
+ adc: max1139@35 {
+ compatible = "maxim,max1139";
+ reg = <0x35>;
+ #io-channel-cells = <1>;
+ };
+
+ ...
+
+ iio_hwmon {
+ compatible = "iio-hwmon";
+ io-channels = <&adc 0>, <&adc 1>, <&adc 2>,
+ <&adc 3>, <&adc 4>, <&adc 5>,
+ <&adc 6>, <&adc 7>, <&adc 8>,
+ <&adc 9>;
+ };
+
+ some_consumer {
+ compatible = "some-consumer";
+ io-channels = <&adc 10>, <&adc 11>;
+ io-channel-names = "adc1", "adc2";
+ };
diff --git a/Documentation/devicetree/bindings/media/coda.txt b/Documentation/devicetree/bindings/media/coda.txt
new file mode 100644
index 000000000000..2865d04e4030
--- /dev/null
+++ b/Documentation/devicetree/bindings/media/coda.txt
@@ -0,0 +1,30 @@
+Chips&Media Coda multi-standard codec IP
+========================================
+
+Coda codec IPs are present in i.MX SoCs in various versions,
+called VPU (Video Processing Unit).
+
+Required properties:
+- compatible : should be "fsl,<chip>-src" for i.MX SoCs:
+ (a) "fsl,imx27-vpu" for CodaDx6 present in i.MX27
+ (b) "fsl,imx53-vpu" for CODA7541 present in i.MX53
+ (c) "fsl,imx6q-vpu" for CODA960 present in i.MX6q
+- reg: should be register base and length as documented in the
+ SoC reference manual
+- interrupts : Should contain the VPU interrupt. For CODA960,
+ a second interrupt is needed for the MJPEG unit.
+- clocks : Should contain the ahb and per clocks, in the order
+ determined by the clock-names property.
+- clock-names : Should be "ahb", "per"
+- iram : phandle pointing to the SRAM device node
+
+Example:
+
+vpu: vpu@63ff4000 {
+ compatible = "fsl,imx53-vpu";
+ reg = <0x63ff4000 0x1000>;
+ interrupts = <9>;
+ clocks = <&clks 63>, <&clks 63>;
+ clock-names = "ahb", "per";
+ iram = <&ocram>;
+};
diff --git a/Documentation/devicetree/bindings/metag/meta-intc.txt b/Documentation/devicetree/bindings/metag/meta-intc.txt
new file mode 100644
index 000000000000..8c47dcbfabc6
--- /dev/null
+++ b/Documentation/devicetree/bindings/metag/meta-intc.txt
@@ -0,0 +1,82 @@
+* Meta External Trigger Controller Binding
+
+This binding specifies what properties must be available in the device tree
+representation of a Meta external trigger controller.
+
+Required properties:
+
+ - compatible: Specifies the compatibility list for the interrupt controller.
+ The type shall be <string> and the value shall include "img,meta-intc".
+
+ - num-banks: Specifies the number of interrupt banks (each of which can
+ handle 32 interrupt sources).
+
+ - interrupt-controller: The presence of this property identifies the node
+ as an interupt controller. No property value shall be defined.
+
+ - #interrupt-cells: Specifies the number of cells needed to encode an
+ interrupt source. The type shall be a <u32> and the value shall be 2.
+
+ - #address-cells: Specifies the number of cells needed to encode an
+ address. The type shall be <u32> and the value shall be 0. As such,
+ 'interrupt-map' nodes do not have to specify a parent unit address.
+
+Optional properties:
+
+ - no-mask: The controller doesn't have any mask registers.
+
+* Interrupt Specifier Definition
+
+ Interrupt specifiers consists of 2 cells encoded as follows:
+
+ - <1st-cell>: The interrupt-number that identifies the interrupt source.
+
+ - <2nd-cell>: The Linux interrupt flags containing level-sense information,
+ encoded as follows:
+ 1 = edge triggered
+ 4 = level-sensitive
+
+* Examples
+
+Example 1:
+
+ /*
+ * Meta external trigger block
+ */
+ intc: intc {
+ // This is an interrupt controller node.
+ interrupt-controller;
+
+ // No address cells so that 'interrupt-map' nodes which
+ // reference this interrupt controller node do not need a parent
+ // address specifier.
+ #address-cells = <0>;
+
+ // Two cells to encode interrupt sources.
+ #interrupt-cells = <2>;
+
+ // Number of interrupt banks
+ num-banks = <2>;
+
+ // No HWMASKEXT is available (specify on Chorus2 and Comet ES1)
+ no-mask;
+
+ // Compatible with Meta hardware trigger block.
+ compatible = "img,meta-intc";
+ };
+
+Example 2:
+
+ /*
+ * An interrupt generating device that is wired to a Meta external
+ * trigger block.
+ */
+ uart1: uart@0x02004c00 {
+ // Interrupt source '5' that is level-sensitive.
+ // Note that there are only two cells as specified in the
+ // interrupt parent's '#interrupt-cells' property.
+ interrupts = <5 4 /* level */>;
+
+ // The interrupt controller that this device is wired to.
+ interrupt-parent = <&intc>;
+ };
diff --git a/Documentation/devicetree/bindings/mfd/ab8500.txt b/Documentation/devicetree/bindings/mfd/ab8500.txt
index 13b707b7355c..c3a14e0ad0ad 100644
--- a/Documentation/devicetree/bindings/mfd/ab8500.txt
+++ b/Documentation/devicetree/bindings/mfd/ab8500.txt
@@ -13,9 +13,6 @@ Required parent device properties:
4 = active high level-sensitive
8 = active low level-sensitive
-Optional parent device properties:
-- reg : contains the PRCMU mailbox address for the AB8500 i2c port
-
The AB8500 consists of a large and varied group of sub-devices:
Device IRQ Names Supply Names Description
@@ -86,9 +83,8 @@ Non-standard child device properties:
- stericsson,amic2-bias-vamic1 : Analoge Mic wishes to use a non-standard Vamic
- stericsson,earpeice-cmv : Earpeice voltage (only: 950 | 1100 | 1270 | 1580)
-ab8500@5 {
+ab8500 {
compatible = "stericsson,ab8500";
- reg = <5>; /* mailbox 5 is i2c */
interrupts = <0 40 0x4>;
interrupt-controller;
#interrupt-cells = <2>;
diff --git a/Documentation/devicetree/bindings/mfd/mc13xxx.txt b/Documentation/devicetree/bindings/mfd/mc13xxx.txt
index baf07987ae68..abd9e3cb2db7 100644
--- a/Documentation/devicetree/bindings/mfd/mc13xxx.txt
+++ b/Documentation/devicetree/bindings/mfd/mc13xxx.txt
@@ -10,10 +10,40 @@ Optional properties:
- fsl,mc13xxx-uses-touch : Indicate the touchscreen controller is being used
Sub-nodes:
-- regulators : Contain the regulator nodes. The MC13892 regulators are
- bound using their names as listed below with their registers and bits
- for enabling.
+- regulators : Contain the regulator nodes. The regulators are bound using
+ their names as listed below with their registers and bits for enabling.
+MC13783 regulators:
+ sw1a : regulator SW1A (register 24, bit 0)
+ sw1b : regulator SW1B (register 25, bit 0)
+ sw2a : regulator SW2A (register 26, bit 0)
+ sw2b : regulator SW2B (register 27, bit 0)
+ sw3 : regulator SW3 (register 29, bit 20)
+ vaudio : regulator VAUDIO (register 32, bit 0)
+ viohi : regulator VIOHI (register 32, bit 3)
+ violo : regulator VIOLO (register 32, bit 6)
+ vdig : regulator VDIG (register 32, bit 9)
+ vgen : regulator VGEN (register 32, bit 12)
+ vrfdig : regulator VRFDIG (register 32, bit 15)
+ vrfref : regulator VRFREF (register 32, bit 18)
+ vrfcp : regulator VRFCP (register 32, bit 21)
+ vsim : regulator VSIM (register 33, bit 0)
+ vesim : regulator VESIM (register 33, bit 3)
+ vcam : regulator VCAM (register 33, bit 6)
+ vrfbg : regulator VRFBG (register 33, bit 9)
+ vvib : regulator VVIB (register 33, bit 11)
+ vrf1 : regulator VRF1 (register 33, bit 12)
+ vrf2 : regulator VRF2 (register 33, bit 15)
+ vmmc1 : regulator VMMC1 (register 33, bit 18)
+ vmmc2 : regulator VMMC2 (register 33, bit 21)
+ gpo1 : regulator GPO1 (register 34, bit 6)
+ gpo2 : regulator GPO2 (register 34, bit 8)
+ gpo3 : regulator GPO3 (register 34, bit 10)
+ gpo4 : regulator GPO4 (register 34, bit 12)
+ pwgt1spi : regulator PWGT1SPI (register 34, bit 15)
+ pwgt2spi : regulator PWGT2SPI (register 34, bit 16)
+
+MC13892 regulators:
vcoincell : regulator VCOINCELL (register 13, bit 23)
sw1 : regulator SW1 (register 24, bit 0)
sw2 : regulator SW2 (register 25, bit 0)
diff --git a/Documentation/devicetree/bindings/mips/cpu_irq.txt b/Documentation/devicetree/bindings/mips/cpu_irq.txt
new file mode 100644
index 000000000000..13aa4b62c62a
--- /dev/null
+++ b/Documentation/devicetree/bindings/mips/cpu_irq.txt
@@ -0,0 +1,47 @@
+MIPS CPU interrupt controller
+
+On MIPS the mips_cpu_intc_init() helper can be used to initialize the 8 CPU
+IRQs from a devicetree file and create a irq_domain for IRQ controller.
+
+With the irq_domain in place we can describe how the 8 IRQs are wired to the
+platforms internal interrupt controller cascade.
+
+Below is an example of a platform describing the cascade inside the devicetree
+and the code used to load it inside arch_init_irq().
+
+Required properties:
+- compatible : Should be "mti,cpu-interrupt-controller"
+
+Example devicetree:
+ cpu-irq: cpu-irq@0 {
+ #address-cells = <0>;
+
+ interrupt-controller;
+ #interrupt-cells = <1>;
+
+ compatible = "mti,cpu-interrupt-controller";
+ };
+
+ intc: intc@200 {
+ compatible = "ralink,rt2880-intc";
+ reg = <0x200 0x100>;
+
+ interrupt-controller;
+ #interrupt-cells = <1>;
+
+ interrupt-parent = <&cpu-irq>;
+ interrupts = <2>;
+ };
+
+
+Example platform irq.c:
+static struct of_device_id __initdata of_irq_ids[] = {
+ { .compatible = "mti,cpu-interrupt-controller", .data = mips_cpu_intc_init },
+ { .compatible = "ralink,rt2880-intc", .data = intc_of_init },
+ {},
+};
+
+void __init arch_init_irq(void)
+{
+ of_irq_init(of_irq_ids);
+}
diff --git a/Documentation/devicetree/bindings/misc/sram.txt b/Documentation/devicetree/bindings/misc/sram.txt
new file mode 100644
index 000000000000..4d0a00e453a8
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/sram.txt
@@ -0,0 +1,16 @@
+Generic on-chip SRAM
+
+Simple IO memory regions to be managed by the genalloc API.
+
+Required properties:
+
+- compatible : mmio-sram
+
+- reg : SRAM iomem address range
+
+Example:
+
+sram: sram@5c000000 {
+ compatible = "mmio-sram";
+ reg = <0x5c000000 0x40000>; /* 256 KiB SRAM at address 0x5c000000 */
+};
diff --git a/Documentation/devicetree/bindings/mtd/elm.txt b/Documentation/devicetree/bindings/mtd/elm.txt
new file mode 100644
index 000000000000..8c1528c421d4
--- /dev/null
+++ b/Documentation/devicetree/bindings/mtd/elm.txt
@@ -0,0 +1,16 @@
+Error location module
+
+Required properties:
+- compatible: Must be "ti,am33xx-elm"
+- reg: physical base address and size of the registers map.
+- interrupts: Interrupt number for the elm.
+
+Optional properties:
+- ti,hwmods: Name of the hwmod associated to the elm
+
+Example:
+elm: elm@0 {
+ compatible = "ti,am3352-elm";
+ reg = <0x48080000 0x2000>;
+ interrupts = <4>;
+};
diff --git a/Documentation/devicetree/bindings/mtd/mtd-physmap.txt b/Documentation/devicetree/bindings/mtd/mtd-physmap.txt
index dab7847fc800..61c5ec850f2f 100644
--- a/Documentation/devicetree/bindings/mtd/mtd-physmap.txt
+++ b/Documentation/devicetree/bindings/mtd/mtd-physmap.txt
@@ -26,6 +26,9 @@ file systems on embedded devices.
- linux,mtd-name: allow to specify the mtd name for retro capability with
physmap-flash drivers as boot loader pass the mtd partition via the old
device name physmap-flash.
+ - use-advanced-sector-protection: boolean to enable support for the
+ advanced sector protection (Spansion: PPB - Persistent Protection
+ Bits) locking.
For JEDEC compatible devices, the following additional properties
are defined:
diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-single.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-single.txt
index 2c81e45f1374..08f0c3d01575 100644
--- a/Documentation/devicetree/bindings/pinctrl/pinctrl-single.txt
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-single.txt
@@ -1,7 +1,9 @@
One-register-per-pin type device tree based pinctrl driver
Required properties:
-- compatible : "pinctrl-single"
+- compatible : "pinctrl-single" or "pinconf-single".
+ "pinctrl-single" means that pinconf isn't supported.
+ "pinconf-single" means that generic pinconf is supported.
- reg : offset and length of the register set for the mux registers
@@ -14,9 +16,61 @@ Optional properties:
- pinctrl-single,function-off : function off mode for disabled state if
available and same for all registers; if not specified, disabling of
pin functions is ignored
+
- pinctrl-single,bit-per-mux : boolean to indicate that one register controls
more than one pin
+- pinctrl-single,drive-strength : array of value that are used to configure
+ drive strength in the pinmux register. They're value of drive strength
+ current and drive strength mask.
+
+ /* drive strength current, mask */
+ pinctrl-single,power-source = <0x30 0xf0>;
+
+- pinctrl-single,bias-pullup : array of value that are used to configure the
+ input bias pullup in the pinmux register.
+
+ /* input, enabled pullup bits, disabled pullup bits, mask */
+ pinctrl-single,bias-pullup = <0 1 0 1>;
+
+- pinctrl-single,bias-pulldown : array of value that are used to configure the
+ input bias pulldown in the pinmux register.
+
+ /* input, enabled pulldown bits, disabled pulldown bits, mask */
+ pinctrl-single,bias-pulldown = <2 2 0 2>;
+
+ * Two bits to control input bias pullup and pulldown: User should use
+ pinctrl-single,bias-pullup & pinctrl-single,bias-pulldown. One bit means
+ pullup, and the other one bit means pulldown.
+ * Three bits to control input bias enable, pullup and pulldown. User should
+ use pinctrl-single,bias-pullup & pinctrl-single,bias-pulldown. Input bias
+ enable bit should be included in pullup or pulldown bits.
+ * Although driver could set PIN_CONFIG_BIAS_DISABLE, there's no property as
+ pinctrl-single,bias-disable. Because pinctrl single driver could implement
+ it by calling pulldown, pullup disabled.
+
+- pinctrl-single,input-schmitt : array of value that are used to configure
+ input schmitt in the pinmux register. In some silicons, there're two input
+ schmitt value (rising-edge & falling-edge) in the pinmux register.
+
+ /* input schmitt value, mask */
+ pinctrl-single,input-schmitt = <0x30 0x70>;
+
+- pinctrl-single,input-schmitt-enable : array of value that are used to
+ configure input schmitt enable or disable in the pinmux register.
+
+ /* input, enable bits, disable bits, mask */
+ pinctrl-single,input-schmitt-enable = <0x30 0x40 0 0x70>;
+
+- pinctrl-single,gpio-range : list of value that are used to configure a GPIO
+ range. They're value of subnode phandle, pin base in pinctrl device, pin
+ number in this range, GPIO function value of this GPIO range.
+ The number of parameters is depend on #pinctrl-single,gpio-range-cells
+ property.
+
+ /* pin base, nr pins & gpio function */
+ pinctrl-single,gpio-range = <&range 0 3 0 &range 3 9 1>;
+
This driver assumes that there is only one register for each pin (unless the
pinctrl-single,bit-per-mux is set), and uses the common pinctrl bindings as
specified in the pinctrl-bindings.txt document in this directory.
@@ -42,6 +96,20 @@ Where 0xdc is the offset from the pinctrl register base address for the
device pinctrl register, 0x18 is the desired value, and 0xff is the sub mask to
be used when applying this change to the register.
+
+Optional sub-node: In case some pins could be configured as GPIO in the pinmux
+register, those pins could be defined as a GPIO range. This sub-node is required
+by pinctrl-single,gpio-range property.
+
+Required properties in sub-node:
+- #pinctrl-single,gpio-range-cells : the number of parameters after phandle in
+ pinctrl-single,gpio-range property.
+
+ range: gpio-range {
+ #pinctrl-single,gpio-range-cells = <3>;
+ };
+
+
Example:
/* SoC common file */
@@ -58,7 +126,7 @@ pmx_core: pinmux@4a100040 {
/* second controller instance for pins in wkup domain */
pmx_wkup: pinmux@4a31e040 {
- compatible = "pinctrl-single;
+ compatible = "pinctrl-single";
reg = <0x4a31e040 0x0038>;
#address-cells = <1>;
#size-cells = <0>;
@@ -76,6 +144,29 @@ control_devconf0: pinmux@48002274 {
pinctrl-single,function-mask = <0x5F>;
};
+/* third controller instance for pins in gpio domain */
+pmx_gpio: pinmux@d401e000 {
+ compatible = "pinconf-single";
+ reg = <0xd401e000 0x0330>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ pinctrl-single,register-width = <32>;
+ pinctrl-single,function-mask = <7>;
+
+ /* sparse GPIO range could be supported */
+ pinctrl-single,gpio-range = <&range 0 3 0 &range 3 9 1
+ &range 12 1 0 &range 13 29 1
+ &range 43 1 0 &range 44 49 1
+ &range 94 1 1 &range 96 2 1>;
+
+ range: gpio-range {
+ #pinctrl-single,gpio-range-cells = <3>;
+ };
+};
+
+
/* board specific .dts file */
&pmx_core {
@@ -96,6 +187,15 @@ control_devconf0: pinmux@48002274 {
>;
};
+ uart0_pins: pinmux_uart0_pins {
+ pinctrl-single,pins = <
+ 0x208 0 /* UART0_RXD (IOCFG138) */
+ 0x20c 0 /* UART0_TXD (IOCFG139) */
+ >;
+ pinctrl-single,bias-pulldown = <0 2 2>;
+ pinctrl-single,bias-pullup = <0 1 1>;
+ };
+
/* map uart2 pins */
uart2_pins: pinmux_uart2_pins {
pinctrl-single,pins = <
@@ -122,6 +222,11 @@ control_devconf0: pinmux@48002274 {
};
+&uart1 {
+ pinctrl-names = "default";
+ pinctrl-0 = <&uart0_pins>;
+};
+
&uart2 {
pinctrl-names = "default";
pinctrl-0 = <&uart2_pins>;
diff --git a/Documentation/devicetree/bindings/pinctrl/samsung-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/samsung-pinctrl.txt
index 4598a47aa0cd..c70fca146e91 100644
--- a/Documentation/devicetree/bindings/pinctrl/samsung-pinctrl.txt
+++ b/Documentation/devicetree/bindings/pinctrl/samsung-pinctrl.txt
@@ -7,6 +7,7 @@ on-chip controllers onto these pads.
Required Properties:
- compatible: should be one of the following.
+ - "samsung,s3c64xx-pinctrl": for S3C64xx-compatible pin-controller,
- "samsung,exynos4210-pinctrl": for Exynos4210 compatible pin-controller.
- "samsung,exynos4x12-pinctrl": for Exynos4x12 compatible pin-controller.
- "samsung,exynos5250-pinctrl": for Exynos5250 compatible pin-controller.
@@ -105,6 +106,8 @@ B. External Wakeup Interrupts: For supporting external wakeup interrupts, a
- compatible: identifies the type of the external wakeup interrupt controller
The possible values are:
+ - samsung,s3c64xx-wakeup-eint: represents wakeup interrupt controller
+ found on Samsung S3C64xx SoCs,
- samsung,exynos4210-wakeup-eint: represents wakeup interrupt controller
found on Samsung Exynos4210 SoC.
- interrupt-parent: phandle of the interrupt parent to which the external
diff --git a/Documentation/devicetree/bindings/regulator/max8952.txt b/Documentation/devicetree/bindings/regulator/max8952.txt
new file mode 100644
index 000000000000..866fcdd0f4eb
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/max8952.txt
@@ -0,0 +1,52 @@
+Maxim MAX8952 voltage regulator
+
+Required properties:
+- compatible: must be equal to "maxim,max8952"
+- reg: I2C slave address, usually 0x60
+- max8952,dvs-mode-microvolt: array of 4 integer values defining DVS voltages
+ in microvolts. All values must be from range <770000, 1400000>
+- any required generic properties defined in regulator.txt
+
+Optional properties:
+- max8952,vid-gpios: array of two GPIO pins used for DVS voltage selection
+- max8952,en-gpio: GPIO used to control enable status of regulator
+- max8952,default-mode: index of default DVS voltage, from <0, 3> range
+- max8952,sync-freq: sync frequency, must be one of following values:
+ - 0: 26 MHz
+ - 1: 13 MHz
+ - 2: 19.2 MHz
+ Defaults to 26 MHz if not specified.
+- max8952,ramp-speed: voltage ramp speed, must be one of following values:
+ - 0: 32mV/us
+ - 1: 16mV/us
+ - 2: 8mV/us
+ - 3: 4mV/us
+ - 4: 2mV/us
+ - 5: 1mV/us
+ - 6: 0.5mV/us
+ - 7: 0.25mV/us
+ Defaults to 32mV/us if not specified.
+- any available generic properties defined in regulator.txt
+
+Example:
+
+ vdd_arm_reg: pmic@60 {
+ compatible = "maxim,max8952";
+ reg = <0x60>;
+
+ /* max8952-specific properties */
+ max8952,vid-gpios = <&gpx0 3 0>, <&gpx0 4 0>;
+ max8952,en-gpio = <&gpx0 1 0>;
+ max8952,default-mode = <0>;
+ max8952,dvs-mode-microvolt = <1250000>, <1200000>,
+ <1050000>, <950000>;
+ max8952,sync-freq = <0>;
+ max8952,ramp-speed = <0>;
+
+ /* generic regulator properties */
+ regulator-name = "vdd_arm";
+ regulator-min-microvolt = <770000>;
+ regulator-max-microvolt = <1400000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
diff --git a/Documentation/devicetree/bindings/rtc/atmel,at91rm9200-rtc.txt b/Documentation/devicetree/bindings/rtc/atmel,at91rm9200-rtc.txt
new file mode 100644
index 000000000000..2a3feabd3b22
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/atmel,at91rm9200-rtc.txt
@@ -0,0 +1,15 @@
+Atmel AT91RM9200 Real Time Clock
+
+Required properties:
+- compatible: should be: "atmel,at91rm9200-rtc"
+- reg: physical base address of the controller and length of memory mapped
+ region.
+- interrupts: rtc alarm/event interrupt
+
+Example:
+
+rtc@fffffe00 {
+ compatible = "atmel,at91rm9200-rtc";
+ reg = <0xfffffe00 0x100>;
+ interrupts = <1 4 7>;
+};
diff --git a/Documentation/devicetree/bindings/serial/lantiq_asc.txt b/Documentation/devicetree/bindings/serial/lantiq_asc.txt
new file mode 100644
index 000000000000..5b78591aaa46
--- /dev/null
+++ b/Documentation/devicetree/bindings/serial/lantiq_asc.txt
@@ -0,0 +1,16 @@
+Lantiq SoC ASC serial controller
+
+Required properties:
+- compatible : Should be "lantiq,asc"
+- reg : Address and length of the register set for the device
+- interrupts: the 3 (tx rx err) interrupt numbers. The interrupt specifier
+ depends on the interrupt-parent interrupt controller.
+
+Example:
+
+asc1: serial@E100C00 {
+ compatible = "lantiq,asc";
+ reg = <0xE100C00 0x400>;
+ interrupt-parent = <&icu0>;
+ interrupts = <112 113 114>;
+};
diff --git a/Documentation/devicetree/bindings/spi/brcm,bcm2835-spi.txt b/Documentation/devicetree/bindings/spi/brcm,bcm2835-spi.txt
new file mode 100644
index 000000000000..8bf89c643640
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/brcm,bcm2835-spi.txt
@@ -0,0 +1,22 @@
+Broadcom BCM2835 SPI0 controller
+
+The BCM2835 contains two forms of SPI master controller, one known simply as
+SPI0, and the other known as the "Universal SPI Master"; part of the
+auxilliary block. This binding applies to the SPI0 controller.
+
+Required properties:
+- compatible: Should be "brcm,bcm2835-spi".
+- reg: Should contain register location and length.
+- interrupts: Should contain interrupt.
+- clocks: The clock feeding the SPI controller.
+
+Example:
+
+spi@20204000 {
+ compatible = "brcm,bcm2835-spi";
+ reg = <0x7e204000 0x1000>;
+ interrupts = <2 22>;
+ clocks = <&clk_spi>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+};
diff --git a/Documentation/devicetree/bindings/spi/fsl-spi.txt b/Documentation/devicetree/bindings/spi/fsl-spi.txt
index 777abd7399d5..b032dd76e9d2 100644
--- a/Documentation/devicetree/bindings/spi/fsl-spi.txt
+++ b/Documentation/devicetree/bindings/spi/fsl-spi.txt
@@ -4,7 +4,7 @@ Required properties:
- cell-index : QE SPI subblock index.
0: QE subblock SPI1
1: QE subblock SPI2
-- compatible : should be "fsl,spi".
+- compatible : should be "fsl,spi" or "aeroflexgaisler,spictrl".
- mode : the SPI operation mode, it can be "cpu" or "cpu-qe".
- reg : Offset and length of the register set for the device
- interrupts : <a b> where a is the interrupt number and b is a
@@ -14,6 +14,7 @@ Required properties:
controller you have.
- interrupt-parent : the phandle for the interrupt controller that
services interrupts for this device.
+- clock-frequency : input clock frequency to non FSL_SOC cores
Optional properties:
- gpios : specifies the gpio pins to be used for chipselects.
diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt b/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt
new file mode 100644
index 000000000000..91ff771c7e77
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt
@@ -0,0 +1,26 @@
+NVIDIA Tegra114 SPI controller.
+
+Required properties:
+- compatible : should be "nvidia,tegra114-spi".
+- reg: Should contain SPI registers location and length.
+- interrupts: Should contain SPI interrupts.
+- nvidia,dma-request-selector : The Tegra DMA controller's phandle and
+ request selector for this SPI controller.
+- This is also require clock named "spi" as per binding document
+ Documentation/devicetree/bindings/clock/clock-bindings.txt
+
+Recommended properties:
+- spi-max-frequency: Definition as per
+ Documentation/devicetree/bindings/spi/spi-bus.txt
+Example:
+
+spi@7000d600 {
+ compatible = "nvidia,tegra114-spi";
+ reg = <0x7000d600 0x200>;
+ interrupts = <0 82 0x04>;
+ nvidia,dma-request-selector = <&apbdma 16>;
+ spi-max-frequency = <25000000>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ status = "disabled";
+};
diff --git a/Documentation/devicetree/bindings/spi/spi-samsung.txt b/Documentation/devicetree/bindings/spi/spi-samsung.txt
index a15ffeddfba4..86aa061f069f 100644
--- a/Documentation/devicetree/bindings/spi/spi-samsung.txt
+++ b/Documentation/devicetree/bindings/spi/spi-samsung.txt
@@ -31,9 +31,6 @@ Required Board Specific Properties:
- #address-cells: should be 1.
- #size-cells: should be 0.
-- gpios: The gpio specifier for clock, mosi and miso interface lines (in the
- order specified). The format of the gpio specifier depends on the gpio
- controller.
Optional Board Specific Properties:
@@ -86,9 +83,8 @@ Example:
spi_0: spi@12d20000 {
#address-cells = <1>;
#size-cells = <0>;
- gpios = <&gpa2 4 2 3 0>,
- <&gpa2 6 2 3 0>,
- <&gpa2 7 2 3 0>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&spi0_bus>;
w25q80bw@0 {
#address-cells = <1>;
diff --git a/Documentation/devicetree/bindings/staging/dwc2.txt b/Documentation/devicetree/bindings/staging/dwc2.txt
new file mode 100644
index 000000000000..1a1b7cfa4845
--- /dev/null
+++ b/Documentation/devicetree/bindings/staging/dwc2.txt
@@ -0,0 +1,15 @@
+Platform DesignWare HS OTG USB 2.0 controller
+-----------------------------------------------------
+
+Required properties:
+- compatible : "snps,dwc2"
+- reg : Should contain 1 register range (address and length)
+- interrupts : Should contain 1 interrupt
+
+Example:
+
+ usb@101c0000 {
+ compatible = "ralink,rt3050-usb, snps,dwc2";
+ reg = <0x101c0000 40000>;
+ interrupts = <18>;
+ };
diff --git a/Documentation/devicetree/bindings/staging/imx-drm/fsl-imx-drm.txt b/Documentation/devicetree/bindings/staging/imx-drm/fsl-imx-drm.txt
index 07654f0338b6..8071ac20d4b3 100644
--- a/Documentation/devicetree/bindings/staging/imx-drm/fsl-imx-drm.txt
+++ b/Documentation/devicetree/bindings/staging/imx-drm/fsl-imx-drm.txt
@@ -26,7 +26,7 @@ Required properties:
- crtc: the crtc this display is connected to, see below
Optional properties:
- interface_pix_fmt: How this display is connected to the
- crtc. Currently supported types: "rgb24", "rgb565"
+ crtc. Currently supported types: "rgb24", "rgb565", "bgr666"
- edid: verbatim EDID data block describing attached display.
- ddc: phandle describing the i2c bus handling the display data
channel
diff --git a/Documentation/devicetree/bindings/thermal/dove-thermal.txt b/Documentation/devicetree/bindings/thermal/dove-thermal.txt
new file mode 100644
index 000000000000..6f474677d472
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/dove-thermal.txt
@@ -0,0 +1,18 @@
+* Dove Thermal
+
+This driver is for Dove SoCs which contain a thermal sensor.
+
+Required properties:
+- compatible : "marvell,dove-thermal"
+- reg : Address range of the thermal registers
+
+The reg properties should contain two ranges. The first is for the
+three Thermal Manager registers, while the second range contains the
+Thermal Diode Control Registers.
+
+Example:
+
+ thermal@10078 {
+ compatible = "marvell,dove-thermal";
+ reg = <0xd001c 0x0c>, <0xd005c 0x08>;
+ };
diff --git a/Documentation/devicetree/bindings/thermal/kirkwood-thermal.txt b/Documentation/devicetree/bindings/thermal/kirkwood-thermal.txt
new file mode 100644
index 000000000000..8c0f5eb86da7
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/kirkwood-thermal.txt
@@ -0,0 +1,15 @@
+* Kirkwood Thermal
+
+This version is for Kirkwood 88F8262 & 88F6283 SoCs. Other kirkwoods
+don't contain a thermal sensor.
+
+Required properties:
+- compatible : "marvell,kirkwood-thermal"
+- reg : Address range of the thermal registers
+
+Example:
+
+ thermal@10078 {
+ compatible = "marvell,kirkwood-thermal";
+ reg = <0x10078 0x4>;
+ };
diff --git a/Documentation/devicetree/bindings/thermal/rcar-thermal.txt b/Documentation/devicetree/bindings/thermal/rcar-thermal.txt
new file mode 100644
index 000000000000..28ef498a66e5
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/rcar-thermal.txt
@@ -0,0 +1,29 @@
+* Renesas R-Car Thermal
+
+Required properties:
+- compatible : "renesas,rcar-thermal"
+- reg : Address range of the thermal registers.
+ The 1st reg will be recognized as common register
+ if it has "interrupts".
+
+Option properties:
+
+- interrupts : use interrupt
+
+Example (non interrupt support):
+
+thermal@e61f0100 {
+ compatible = "renesas,rcar-thermal";
+ reg = <0xe61f0100 0x38>;
+};
+
+Example (interrupt support):
+
+thermal@e61f0000 {
+ compatible = "renesas,rcar-thermal";
+ reg = <0xe61f0000 0x14
+ 0xe61f0100 0x38
+ 0xe61f0200 0x38
+ 0xe61f0300 0x38>;
+ interrupts = <0 69 4>;
+};
diff --git a/Documentation/devicetree/bindings/arm/armada-370-xp-timer.txt b/Documentation/devicetree/bindings/timer/marvell,armada-370-xp-timer.txt
index 64830118b013..36381129d141 100644
--- a/Documentation/devicetree/bindings/arm/armada-370-xp-timer.txt
+++ b/Documentation/devicetree/bindings/timer/marvell,armada-370-xp-timer.txt
@@ -1,10 +1,13 @@
-Marvell Armada 370 and Armada XP Global Timers
-----------------------------------------------
+Marvell Armada 370 and Armada XP Timers
+---------------------------------------
Required properties:
- compatible: Should be "marvell,armada-370-xp-timer"
-- interrupts: Should contain the list of Global Timer interrupts
-- reg: Should contain the base address of the Global Timer registers
+- interrupts: Should contain the list of Global Timer interrupts and
+ then local timer interrupts
+- reg: Should contain location and length for timers register. First
+ pair for the Global Timer registers, second pair for the
+ local/private timers.
- clocks: clock driving the timer hardware
Optional properties:
diff --git a/Documentation/devicetree/bindings/tty/serial/of-serial.txt b/Documentation/devicetree/bindings/tty/serial/of-serial.txt
index 1e1145ca4f3c..1928a3e83cd0 100644
--- a/Documentation/devicetree/bindings/tty/serial/of-serial.txt
+++ b/Documentation/devicetree/bindings/tty/serial/of-serial.txt
@@ -11,6 +11,9 @@ Required properties:
- "nvidia,tegra20-uart"
- "nxp,lpc3220-uart"
- "ibm,qpace-nwp-serial"
+ - "altr,16550-FIFO32"
+ - "altr,16550-FIFO64"
+ - "altr,16550-FIFO128"
- "serial" if the port type is unknown.
- reg : offset and length of the register set for the device.
- interrupts : should contain uart interrupt.
@@ -30,6 +33,10 @@ Optional properties:
RTAS and should not be registered.
- no-loopback-test: set to indicate that the port does not implements loopback
test mode
+- fifo-size: the fifo size of the UART.
+- auto-flow-control: one way to enable automatic flow control support. The
+ driver is allowed to detect support for the capability even without this
+ property.
Example:
diff --git a/Documentation/devicetree/bindings/usb/ci13xxx-imx.txt b/Documentation/devicetree/bindings/usb/ci13xxx-imx.txt
index 5778b9c83bd8..1c04a4c9515f 100644
--- a/Documentation/devicetree/bindings/usb/ci13xxx-imx.txt
+++ b/Documentation/devicetree/bindings/usb/ci13xxx-imx.txt
@@ -11,6 +11,7 @@ Optional properties:
that indicate usb controller index
- vbus-supply: regulator for vbus
- disable-over-current: disable over current detect
+- external-vbus-divider: enables off-chip resistor divider for Vbus
Examples:
usb@02184000 { /* USB OTG */
@@ -20,4 +21,5 @@ usb@02184000 { /* USB OTG */
fsl,usbphy = <&usbphy1>;
fsl,usbmisc = <&usbmisc 0>;
disable-over-current;
+ external-vbus-divider;
};
diff --git a/Documentation/devicetree/bindings/usb/ehci-omap.txt b/Documentation/devicetree/bindings/usb/ehci-omap.txt
new file mode 100644
index 000000000000..485a9a1efa7a
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/ehci-omap.txt
@@ -0,0 +1,32 @@
+OMAP HS USB EHCI controller
+
+This device is usually the child of the omap-usb-host
+Documentation/devicetree/bindings/mfd/omap-usb-host.txt
+
+Required properties:
+
+- compatible: should be "ti,ehci-omap"
+- reg: should contain one register range i.e. start and length
+- interrupts: description of the interrupt line
+
+Optional properties:
+
+- phys: list of phandles to PHY nodes.
+ This property is required if at least one of the ports are in
+ PHY mode i.e. OMAP_EHCI_PORT_MODE_PHY
+
+To specify the port mode, see
+Documentation/devicetree/bindings/mfd/omap-usb-host.txt
+
+Example for OMAP4:
+
+usbhsehci: ehci@4a064c00 {
+ compatible = "ti,ehci-omap", "usb-ehci";
+ reg = <0x4a064c00 0x400>;
+ interrupts = <0 77 0x4>;
+};
+
+&usbhsehci {
+ phys = <&hsusb1_phy 0 &hsusb3_phy>;
+};
+
diff --git a/Documentation/devicetree/bindings/usb/ohci-omap3.txt b/Documentation/devicetree/bindings/usb/ohci-omap3.txt
new file mode 100644
index 000000000000..14ab42812a8e
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/ohci-omap3.txt
@@ -0,0 +1,15 @@
+OMAP HS USB OHCI controller (OMAP3 and later)
+
+Required properties:
+
+- compatible: should be "ti,ohci-omap3"
+- reg: should contain one register range i.e. start and length
+- interrupts: description of the interrupt line
+
+Example for OMAP4:
+
+usbhsohci: ohci@4a064800 {
+ compatible = "ti,ohci-omap3", "usb-ohci";
+ reg = <0x4a064800 0x400>;
+ interrupts = <0 76 0x4>;
+};
diff --git a/Documentation/devicetree/bindings/usb/omap-usb.txt b/Documentation/devicetree/bindings/usb/omap-usb.txt
index 1ef0ce71f8fa..662f0f1d2315 100644
--- a/Documentation/devicetree/bindings/usb/omap-usb.txt
+++ b/Documentation/devicetree/bindings/usb/omap-usb.txt
@@ -8,10 +8,10 @@ OMAP MUSB GLUE
and disconnect.
- multipoint : Should be "1" indicating the musb controller supports
multipoint. This is a MUSB configuration-specific setting.
- - num_eps : Specifies the number of endpoints. This is also a
+ - num-eps : Specifies the number of endpoints. This is also a
MUSB configuration-specific setting. Should be set to "16"
- - ram_bits : Specifies the ram address size. Should be set to "12"
- - interface_type : This is a board specific setting to describe the type of
+ - ram-bits : Specifies the ram address size. Should be set to "12"
+ - interface-type : This is a board specific setting to describe the type of
interface between the controller and the phy. It should be "0" or "1"
specifying ULPI and UTMI respectively.
- mode : Should be "3" to represent OTG. "1" signifies HOST and "2"
@@ -29,18 +29,46 @@ usb_otg_hs: usb_otg_hs@4a0ab000 {
ti,hwmods = "usb_otg_hs";
ti,has-mailbox;
multipoint = <1>;
- num_eps = <16>;
- ram_bits = <12>;
+ num-eps = <16>;
+ ram-bits = <12>;
ctrl-module = <&omap_control_usb>;
};
Board specific device node entry
&usb_otg_hs {
- interface_type = <1>;
+ interface-type = <1>;
mode = <3>;
power = <50>;
};
+OMAP DWC3 GLUE
+ - compatible : Should be "ti,dwc3"
+ - ti,hwmods : Should be "usb_otg_ss"
+ - reg : Address and length of the register set for the device.
+ - interrupts : The irq number of this device that is used to interrupt the
+ MPU
+ - #address-cells, #size-cells : Must be present if the device has sub-nodes
+ - utmi-mode : controls the source of UTMI/PIPE status for VBUS and OTG ID.
+ It should be set to "1" for HW mode and "2" for SW mode.
+ - ranges: the child address space are mapped 1:1 onto the parent address space
+
+Sub-nodes:
+The dwc3 core should be added as subnode to omap dwc3 glue.
+- dwc3 :
+ The binding details of dwc3 can be found in:
+ Documentation/devicetree/bindings/usb/dwc3.txt
+
+omap_dwc3 {
+ compatible = "ti,dwc3";
+ ti,hwmods = "usb_otg_ss";
+ reg = <0x4a020000 0x1ff>;
+ interrupts = <0 93 4>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ utmi-mode = <2>;
+ ranges;
+};
+
OMAP CONTROL USB
Required properties:
diff --git a/Documentation/devicetree/bindings/usb/samsung-usbphy.txt b/Documentation/devicetree/bindings/usb/samsung-usbphy.txt
index 033194934f64..f575302e5173 100644
--- a/Documentation/devicetree/bindings/usb/samsung-usbphy.txt
+++ b/Documentation/devicetree/bindings/usb/samsung-usbphy.txt
@@ -1,20 +1,25 @@
-* Samsung's usb phy transceiver
+SAMSUNG USB-PHY controllers
-The Samsung's phy transceiver is used for controlling usb phy for
-s3c-hsotg as well as ehci-s5p and ohci-exynos usb controllers
-across Samsung SOCs.
+** Samsung's usb 2.0 phy transceiver
+
+The Samsung's usb 2.0 phy transceiver is used for controlling
+usb 2.0 phy for s3c-hsotg as well as ehci-s5p and ohci-exynos
+usb controllers across Samsung SOCs.
TODO: Adding the PHY binding with controller(s) according to the under
developement generic PHY driver.
Required properties:
Exynos4210:
-- compatible : should be "samsung,exynos4210-usbphy"
+- compatible : should be "samsung,exynos4210-usb2phy"
- reg : base physical address of the phy registers and length of memory mapped
region.
+- clocks: Clock IDs array as required by the controller.
+- clock-names: names of clock correseponding IDs clock property as requested
+ by the controller driver.
Exynos5250:
-- compatible : should be "samsung,exynos5250-usbphy"
+- compatible : should be "samsung,exynos5250-usb2phy"
- reg : base physical address of the phy registers and length of memory mapped
region.
@@ -44,12 +49,69 @@ Example:
usbphy@125B0000 {
#address-cells = <1>;
#size-cells = <1>;
- compatible = "samsung,exynos4210-usbphy";
+ compatible = "samsung,exynos4210-usb2phy";
reg = <0x125B0000 0x100>;
ranges;
+ clocks = <&clock 2>, <&clock 305>;
+ clock-names = "xusbxti", "otg";
+
usbphy-sys {
/* USB device and host PHY_CONTROL registers */
reg = <0x10020704 0x8>;
};
};
+
+
+** Samsung's usb 3.0 phy transceiver
+
+Starting exynso5250, Samsung's SoC have usb 3.0 phy transceiver
+which is used for controlling usb 3.0 phy for dwc3-exynos usb 3.0
+controllers across Samsung SOCs.
+
+Required properties:
+
+Exynos5250:
+- compatible : should be "samsung,exynos5250-usb3phy"
+- reg : base physical address of the phy registers and length of memory mapped
+ region.
+- clocks: Clock IDs array as required by the controller.
+- clock-names: names of clocks correseponding to IDs in the clock property
+ as requested by the controller driver.
+
+Optional properties:
+- #address-cells: should be '1' when usbphy node has a child node with 'reg'
+ property.
+- #size-cells: should be '1' when usbphy node has a child node with 'reg'
+ property.
+- ranges: allows valid translation between child's address space and parent's
+ address space.
+
+- The child node 'usbphy-sys' to the node 'usbphy' is for the system controller
+ interface for usb-phy. It should provide the following information required by
+ usb-phy controller to control phy.
+ - reg : base physical address of PHY_CONTROL registers.
+ The size of this register is the total sum of size of all PHY_CONTROL
+ registers that the SoC has. For example, the size will be
+ '0x4' in case we have only one PHY_CONTROL register (e.g.
+ OTHERS register in S3C64XX or USB_PHY_CONTROL register in S5PV210)
+ and, '0x8' in case we have two PHY_CONTROL registers (e.g.
+ USBDEVICE_PHY_CONTROL and USBHOST_PHY_CONTROL registers in exynos4x).
+ and so on.
+
+Example:
+ usbphy@12100000 {
+ compatible = "samsung,exynos5250-usb3phy";
+ reg = <0x12100000 0x100>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ clocks = <&clock 1>, <&clock 286>;
+ clock-names = "ext_xtal", "usbdrd30";
+
+ usbphy-sys {
+ /* USB device and host PHY_CONTROL registers */
+ reg = <0x10040704 0x8>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/usb/usb-nop-xceiv.txt b/Documentation/devicetree/bindings/usb/usb-nop-xceiv.txt
new file mode 100644
index 000000000000..d7e272671c7e
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/usb-nop-xceiv.txt
@@ -0,0 +1,34 @@
+USB NOP PHY
+
+Required properties:
+- compatible: should be usb-nop-xceiv
+
+Optional properties:
+- clocks: phandle to the PHY clock. Use as per Documentation/devicetree
+ /bindings/clock/clock-bindings.txt
+ This property is required if clock-frequency is specified.
+
+- clock-names: Should be "main_clk"
+
+- clock-frequency: the clock frequency (in Hz) that the PHY clock must
+ be configured to.
+
+- vcc-supply: phandle to the regulator that provides RESET to the PHY.
+
+- reset-supply: phandle to the regulator that provides power to the PHY.
+
+Example:
+
+ hsusb1_phy {
+ compatible = "usb-nop-xceiv";
+ clock-frequency = <19200000>;
+ clocks = <&osc 0>;
+ clock-names = "main_clk";
+ vcc-supply = <&hsusb1_vcc_regulator>;
+ reset-supply = <&hsusb1_reset_regulator>;
+ };
+
+hsusb1_phy is a NOP USB PHY device that gets its clock from an oscillator
+and expects that clock to be configured to 19.2MHz by the NOP PHY driver.
+hsusb1_vcc_regulator provides power to the PHY and hsusb1_reset_regulator
+controls RESET.
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index 19e1ef73ab0d..4d1919bf2332 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -5,6 +5,7 @@ using them to avoid name-space collisions.
ad Avionic Design GmbH
adi Analog Devices, Inc.
+aeroflexgaisler Aeroflex Gaisler AB
ak Asahi Kasei Corp.
amcc Applied Micro Circuits Corporation (APM, formally AMCC)
apm Applied Micro Circuits Corporation (APM)
@@ -48,6 +49,7 @@ samsung Samsung Semiconductor
sbs Smart Battery System
schindler Schindler
sil Silicon Image
+silabs Silicon Laboratories
simtek
sirf SiRF Technology, Inc.
snps Synopsys, Inc.
diff --git a/Documentation/devicetree/bindings/video/backlight/lp855x.txt b/Documentation/devicetree/bindings/video/backlight/lp855x.txt
new file mode 100644
index 000000000000..1482103d288f
--- /dev/null
+++ b/Documentation/devicetree/bindings/video/backlight/lp855x.txt
@@ -0,0 +1,41 @@
+lp855x bindings
+
+Required properties:
+ - compatible: "ti,lp8550", "ti,lp8551", "ti,lp8552", "ti,lp8553",
+ "ti,lp8556", "ti,lp8557"
+ - reg: I2C slave address (u8)
+ - dev-ctrl: Value of DEVICE CONTROL register (u8). It depends on the device.
+
+Optional properties:
+ - bl-name: Backlight device name (string)
+ - init-brt: Initial value of backlight brightness (u8)
+ - pwm-period: PWM period value. Set only PWM input mode used (u32)
+ - rom-addr: Register address of ROM area to be updated (u8)
+ - rom-val: Register value to be updated (u8)
+
+Example:
+
+ /* LP8556 */
+ backlight@2c {
+ compatible = "ti,lp8556";
+ reg = <0x2c>;
+
+ bl-name = "lcd-bl";
+ dev-ctrl = /bits/ 8 <0x85>;
+ init-brt = /bits/ 8 <0x10>;
+ };
+
+ /* LP8557 */
+ backlight@2c {
+ compatible = "ti,lp8557";
+ reg = <0x2c>;
+
+ dev-ctrl = /bits/ 8 <0x41>;
+ init-brt = /bits/ 8 <0x0a>;
+
+ /* 4V OV, 4 output LED string enabled */
+ rom_14h {
+ rom-addr = /bits/ 8 <0x14>;
+ rom-val = /bits/ 8 <0xcf>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/video/backlight/tps65217-backlight.txt b/Documentation/devicetree/bindings/video/backlight/tps65217-backlight.txt
new file mode 100644
index 000000000000..5fb9279ac287
--- /dev/null
+++ b/Documentation/devicetree/bindings/video/backlight/tps65217-backlight.txt
@@ -0,0 +1,27 @@
+TPS65217 family of regulators
+
+The TPS65217 chip contains a boost converter and current sinks which can be
+used to drive LEDs for use as backlights.
+
+Required properties:
+- compatible: "ti,tps65217"
+- reg: I2C slave address
+- backlight: node for specifying WLED1 and WLED2 lines in TPS65217
+- isel: selection bit, valid values: 1 for ISEL1 (low-level) and 2 for ISEL2 (high-level)
+- fdim: PWM dimming frequency, valid values: 100, 200, 500, 1000
+- default-brightness: valid values: 0-100
+
+Each regulator is defined using the standard binding for regulators.
+
+Example:
+
+ tps: tps@24 {
+ reg = <0x24>;
+ compatible = "ti,tps65217";
+ backlight {
+ isel = <1>; /* 1 - ISET1, 2 ISET2 */
+ fdim = <100>; /* TPS65217_BL_FDIM_100HZ */
+ default-brightness = <50>;
+ };
+ };
+
diff --git a/Documentation/devicetree/bindings/video/via,vt8500-fb.txt b/Documentation/devicetree/bindings/video/via,vt8500-fb.txt
index c870b6478ec8..2871e218a0fb 100644
--- a/Documentation/devicetree/bindings/video/via,vt8500-fb.txt
+++ b/Documentation/devicetree/bindings/video/via,vt8500-fb.txt
@@ -5,58 +5,32 @@ Required properties:
- compatible : "via,vt8500-fb"
- reg : Should contain 1 register ranges(address and length)
- interrupts : framebuffer controller interrupt
-- display: a phandle pointing to the display node
+- bits-per-pixel : bit depth of framebuffer (16 or 32)
-Required nodes:
-- display: a display node is required to initialize the lcd panel
- This should be in the board dts.
-- default-mode: a videomode within the display with timing parameters
- as specified below.
+Required subnodes:
+- display-timings: see display-timing.txt for information
Example:
- fb@d800e400 {
+ fb@d8050800 {
compatible = "via,vt8500-fb";
reg = <0xd800e400 0x400>;
interrupts = <12>;
- display = <&display>;
- default-mode = <&mode0>;
- };
-
-VIA VT8500 Display
------------------------------------------------------
-Required properties (as per of_videomode_helper):
-
- - hactive, vactive: Display resolution
- - hfront-porch, hback-porch, hsync-len: Horizontal Display timing parameters
- in pixels
- vfront-porch, vback-porch, vsync-len: Vertical display timing parameters in
- lines
- - clock: displayclock in Hz
- - bpp: lcd panel bit-depth.
- <16> for RGB565, <32> for RGB888
-
-Optional properties (as per of_videomode_helper):
- - width-mm, height-mm: Display dimensions in mm
- - hsync-active-high (bool): Hsync pulse is active high
- - vsync-active-high (bool): Vsync pulse is active high
- - interlaced (bool): This is an interlaced mode
- - doublescan (bool): This is a doublescan mode
+ bits-per-pixel = <16>;
-Example:
- display: display@0 {
- modes {
- mode0: mode@0 {
+ display-timings {
+ native-mode = <&timing0>;
+ timing0: 800x480 {
+ clock-frequency = <0>; /* unused but required */
hactive = <800>;
vactive = <480>;
- hback-porch = <88>;
hfront-porch = <40>;
+ hback-porch = <88>;
hsync-len = <0>;
vback-porch = <32>;
vfront-porch = <11>;
vsync-len = <1>;
- clock = <0>; /* unused but required */
- bpp = <16>; /* non-standard but required */
};
};
};
+
diff --git a/Documentation/devicetree/bindings/video/wm,wm8505-fb.txt b/Documentation/devicetree/bindings/video/wm,wm8505-fb.txt
index 3d325e1d11ee..0bcadb2840a5 100644
--- a/Documentation/devicetree/bindings/video/wm,wm8505-fb.txt
+++ b/Documentation/devicetree/bindings/video/wm,wm8505-fb.txt
@@ -4,20 +4,30 @@ Wondermedia WM8505 Framebuffer
Required properties:
- compatible : "wm,wm8505-fb"
- reg : Should contain 1 register ranges(address and length)
-- via,display: a phandle pointing to the display node
+- bits-per-pixel : bit depth of framebuffer (16 or 32)
-Required nodes:
-- display: a display node is required to initialize the lcd panel
- This should be in the board dts. See definition in
- Documentation/devicetree/bindings/video/via,vt8500-fb.txt
-- default-mode: a videomode node as specified in
- Documentation/devicetree/bindings/video/via,vt8500-fb.txt
+Required subnodes:
+- display-timings: see display-timing.txt for information
Example:
- fb@d8050800 {
+ fb@d8051700 {
compatible = "wm,wm8505-fb";
- reg = <0xd8050800 0x200>;
- display = <&display>;
- default-mode = <&mode0>;
+ reg = <0xd8051700 0x200>;
+ bits-per-pixel = <16>;
+
+ display-timings {
+ native-mode = <&timing0>;
+ timing0: 800x480 {
+ clock-frequency = <0>; /* unused but required */
+ hactive = <800>;
+ vactive = <480>;
+ hfront-porch = <40>;
+ hback-porch = <88>;
+ hsync-len = <0>;
+ vback-porch = <32>;
+ vfront-porch = <11>;
+ vsync-len = <1>;
+ };
+ };
};
diff --git a/Documentation/devicetree/bindings/w1/fsl-imx-owire.txt b/Documentation/devicetree/bindings/w1/fsl-imx-owire.txt
new file mode 100644
index 000000000000..ecf42c07684d
--- /dev/null
+++ b/Documentation/devicetree/bindings/w1/fsl-imx-owire.txt
@@ -0,0 +1,19 @@
+* Freescale i.MX One wire bus master controller
+
+Required properties:
+- compatible : should be "fsl,imx21-owire"
+- reg : Address and length of the register set for the device
+
+Optional properties:
+- clocks : phandle of clock that supplies the module (required if platform
+ clock bindings use device tree)
+
+Example:
+
+- From imx53.dtsi:
+owire: owire@63fa4000 {
+ compatible = "fsl,imx53-owire", "fsl,imx21-owire";
+ reg = <0x63fa4000 0x4000>;
+ clocks = <&clks 159>;
+ status = "disabled";
+};
diff --git a/Documentation/devicetree/bindings/watchdog/atmel-at91rm9200-wdt.txt b/Documentation/devicetree/bindings/watchdog/atmel-at91rm9200-wdt.txt
new file mode 100644
index 000000000000..d4d86cf8f9eb
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/atmel-at91rm9200-wdt.txt
@@ -0,0 +1,9 @@
+Atmel AT91RM9200 System Timer Watchdog
+
+Required properties:
+- compatible: must be "atmel,at91sam9260-wdt".
+
+Example:
+ watchdog@fffffd00 {
+ compatible = "atmel,at91rm9200-wdt";
+ };
diff --git a/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt b/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt
index 2957ebb5aa71..fcdd48f7dcff 100644
--- a/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt
+++ b/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt
@@ -7,9 +7,13 @@ Required properties:
- reg: physical base address of the controller and length of memory mapped
region.
+Optional properties:
+- timeout-sec: contains the watchdog timeout in seconds.
+
Example:
watchdog@fffffd40 {
compatible = "atmel,at91sam9260-wdt";
reg = <0xfffffd40 0x10>;
+ timeout-sec = <10>;
};
diff --git a/Documentation/devicetree/bindings/watchdog/marvel.txt b/Documentation/devicetree/bindings/watchdog/marvel.txt
index 0b2503ab0a05..5dc8d30061ce 100644
--- a/Documentation/devicetree/bindings/watchdog/marvel.txt
+++ b/Documentation/devicetree/bindings/watchdog/marvel.txt
@@ -5,10 +5,15 @@ Required Properties:
- Compatibility : "marvell,orion-wdt"
- reg : Address of the timer registers
+Optional properties:
+
+- timeout-sec : Contains the watchdog timeout in seconds
+
Example:
wdt@20300 {
compatible = "marvell,orion-wdt";
reg = <0x20300 0x28>;
+ timeout-sec = <10>;
status = "okay";
};
diff --git a/Documentation/devicetree/bindings/watchdog/pnx4008-wdt.txt b/Documentation/devicetree/bindings/watchdog/pnx4008-wdt.txt
index 7c7f6887c796..556d06c17c92 100644
--- a/Documentation/devicetree/bindings/watchdog/pnx4008-wdt.txt
+++ b/Documentation/devicetree/bindings/watchdog/pnx4008-wdt.txt
@@ -5,9 +5,13 @@ Required properties:
- reg: physical base address of the controller and length of memory mapped
region.
+Optional properties:
+- timeout-sec: contains the watchdog timeout in seconds.
+
Example:
watchdog@4003C000 {
compatible = "nxp,pnx4008-wdt";
reg = <0x4003C000 0x1000>;
+ timeout-sec = <10>;
};
diff --git a/Documentation/devicetree/bindings/watchdog/qca-ar7130-wdt.txt b/Documentation/devicetree/bindings/watchdog/qca-ar7130-wdt.txt
new file mode 100644
index 000000000000..7a89e5f85415
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/qca-ar7130-wdt.txt
@@ -0,0 +1,13 @@
+* Qualcomm Atheros AR7130 Watchdog Timer (WDT) Controller
+
+Required properties:
+- compatible: must be "qca,ar7130-wdt"
+- reg: physical base address of the controller and length of memory mapped
+ region.
+
+Example:
+
+wdt@18060008 {
+ compatible = "qca,ar9330-wdt", "qca,ar7130-wdt";
+ reg = <0x18060008 0x8>;
+};
diff --git a/Documentation/devicetree/bindings/watchdog/samsung-wdt.txt b/Documentation/devicetree/bindings/watchdog/samsung-wdt.txt
index ce0d8e78ed8f..2aa486cc1ff6 100644
--- a/Documentation/devicetree/bindings/watchdog/samsung-wdt.txt
+++ b/Documentation/devicetree/bindings/watchdog/samsung-wdt.txt
@@ -9,3 +9,6 @@ Required properties:
- reg : base physical address of the controller and length of memory mapped
region.
- interrupts : interrupt number to the cpu.
+
+Optional properties:
+- timeout-sec : contains the watchdog timeout in seconds.
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt
index 0188903bc9e1..4966b1be42ac 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -302,7 +302,11 @@ Access to a dma_buf from the kernel context involves three steps:
void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)
The vmap call can fail if there is no vmap support in the exporter, or if it
- runs out of vmalloc space. Fallback to kmap should be implemented.
+ runs out of vmalloc space. Fallback to kmap should be implemented. Note that
+ the dma-buf layer keeps a reference count for all vmap access and calls down
+ into the exporter's vmap function only when no vmapping exists, and only
+ unmaps it once. Protection against concurrent vmap/vunmap calls is provided
+ by taking the dma_buf->lock mutex.
3. Finish access
diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt
index d230dd9c99b0..4a93e98b290a 100644
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -150,12 +150,28 @@ discard -- If set, issues discard/TRIM commands to the block
device when blocks are freed. This is useful for SSD devices
and sparse/thinly-provisoned LUNs.
-nfs -- This option maintains an index (cache) of directory
- inodes by i_logstart which is used by the nfs-related code to
- improve look-ups.
+nfs=stale_rw|nostale_ro
+ Enable this only if you want to export the FAT filesystem
+ over NFS.
+
+ stale_rw: This option maintains an index (cache) of directory
+ inodes by i_logstart which is used by the nfs-related code to
+ improve look-ups. Full file operations (read/write) over NFS is
+ supported but with cache eviction at NFS server, this could
+ result in ESTALE issues.
+
+ nostale_ro: This option bases the inode number and filehandle
+ on the on-disk location of a file in the MS-DOS directory entry.
+ This ensures that ESTALE will not be returned after a file is
+ evicted from the inode cache. However, it means that operations
+ such as rename, create and unlink could cause filehandles that
+ previously pointed at one file to point at a different file,
+ potentially causing data corruption. For this reason, this
+ option also mounts the filesystem readonly.
+
+ To maintain backward compatibility, '-o nfs' is also accepted,
+ defaulting to stale_rw
- Enable this only if you want to export the FAT filesystem
- over NFS
<bool>: 0,1,yes,no,true,false
diff --git a/Documentation/hwmon/adm1275 b/Documentation/hwmon/adm1275
index 2cfa25667123..15b4a20d5062 100644
--- a/Documentation/hwmon/adm1275
+++ b/Documentation/hwmon/adm1275
@@ -15,7 +15,7 @@ Supported chips:
Addresses scanned: -
Datasheet: www.analog.com/static/imported-files/data_sheets/ADM1276.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/adt7410 b/Documentation/hwmon/adt7410
index 96004000dc2a..9817941e5f19 100644
--- a/Documentation/hwmon/adt7410
+++ b/Documentation/hwmon/adt7410
@@ -4,28 +4,50 @@ Kernel driver adt7410
Supported chips:
* Analog Devices ADT7410
Prefix: 'adt7410'
- Addresses scanned: I2C 0x48 - 0x4B
+ Addresses scanned: None
Datasheet: Publicly available at the Analog Devices website
http://www.analog.com/static/imported-files/data_sheets/ADT7410.pdf
+ * Analog Devices ADT7420
+ Prefix: 'adt7420'
+ Addresses scanned: None
+ Datasheet: Publicly available at the Analog Devices website
+ http://www.analog.com/static/imported-files/data_sheets/ADT7420.pdf
+ * Analog Devices ADT7310
+ Prefix: 'adt7310'
+ Addresses scanned: None
+ Datasheet: Publicly available at the Analog Devices website
+ http://www.analog.com/static/imported-files/data_sheets/ADT7310.pdf
+ * Analog Devices ADT7320
+ Prefix: 'adt7320'
+ Addresses scanned: None
+ Datasheet: Publicly available at the Analog Devices website
+ http://www.analog.com/static/imported-files/data_sheets/ADT7320.pdf
Author: Hartmut Knaack <knaack.h@gmx.de>
Description
-----------
-The ADT7410 is a temperature sensor with rated temperature range of -55°C to
-+150°C. It has a high accuracy of +/-0.5°C and can be operated at a resolution
-of 13 bits (0.0625°C) or 16 bits (0.0078°C). The sensor provides an INT pin to
-indicate that a minimum or maximum temperature set point has been exceeded, as
-well as a critical temperature (CT) pin to indicate that the critical
-temperature set point has been exceeded. Both pins can be set up with a common
-hysteresis of 0°C - 15°C and a fault queue, ranging from 1 to 4 events. Both
-pins can individually set to be active-low or active-high, while the whole
-device can either run in comparator mode or interrupt mode. The ADT7410
-supports continous temperature sampling, as well as sampling one temperature
-value per second or even justget one sample on demand for power saving.
-Besides, it can completely power down its ADC, if power management is
-required.
+The ADT7310/ADT7410 is a temperature sensor with rated temperature range of
+-55°C to +150°C. It has a high accuracy of +/-0.5°C and can be operated at a
+resolution of 13 bits (0.0625°C) or 16 bits (0.0078°C). The sensor provides an
+INT pin to indicate that a minimum or maximum temperature set point has been
+exceeded, as well as a critical temperature (CT) pin to indicate that the
+critical temperature set point has been exceeded. Both pins can be set up with a
+common hysteresis of 0°C - 15°C and a fault queue, ranging from 1 to 4 events.
+Both pins can individually set to be active-low or active-high, while the whole
+device can either run in comparator mode or interrupt mode. The ADT7410 supports
+continuous temperature sampling, as well as sampling one temperature value per
+second or even just get one sample on demand for power saving. Besides, it can
+completely power down its ADC, if power management is required.
+
+The ADT7320/ADT7420 is register compatible, the only differences being the
+package, a slightly narrower operating temperature range (-40°C to +150°C), and
+a better accuracy (0.25°C instead of 0.50°C.)
+
+The difference between the ADT7310/ADT7320 and ADT7410/ADT7420 is the control
+interface, the ADT7310 and ADT7320 use SPI while the ADT7410 and ADT7420 use
+I2C.
Configuration Notes
-------------------
diff --git a/Documentation/hwmon/jc42 b/Documentation/hwmon/jc42
index 165077121238..868d74d6b773 100644
--- a/Documentation/hwmon/jc42
+++ b/Documentation/hwmon/jc42
@@ -49,7 +49,7 @@ Supported chips:
Addresses scanned: I2C 0x18 - 0x1f
Author:
- Guenter Roeck <guenter.roeck@ericsson.com>
+ Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/lineage-pem b/Documentation/hwmon/lineage-pem
index 2ba5ed126858..83b2ddc160c8 100644
--- a/Documentation/hwmon/lineage-pem
+++ b/Documentation/hwmon/lineage-pem
@@ -8,7 +8,7 @@ Supported devices:
Documentation:
http://www.lineagepower.com/oem/pdf/CPLI2C.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/lm25066 b/Documentation/hwmon/lm25066
index a21db81c4591..c1b57d72efc3 100644
--- a/Documentation/hwmon/lm25066
+++ b/Documentation/hwmon/lm25066
@@ -1,7 +1,13 @@
-Kernel driver max8688
+Kernel driver lm25066
=====================
Supported chips:
+ * TI LM25056
+ Prefix: 'lm25056'
+ Addresses scanned: -
+ Datasheets:
+ http://www.ti.com/lit/gpn/lm25056
+ http://www.ti.com/lit/gpn/lm25056a
* National Semiconductor LM25066
Prefix: 'lm25066'
Addresses scanned: -
@@ -19,14 +25,15 @@ Supported chips:
Datasheet:
http://www.national.com/pf/LM/LM5066.html
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
-----------
-This driver supports hardware montoring for National Semiconductor LM25066,
-LM5064, and LM5064 Power Management, Monitoring, Control, and Protection ICs.
+This driver supports hardware montoring for National Semiconductor / TI LM25056,
+LM25066, LM5064, and LM5064 Power Management, Monitoring, Control, and
+Protection ICs.
The driver is a client driver to the core PMBus driver. Please see
Documentation/hwmon/pmbus for details on PMBus client drivers.
@@ -60,14 +67,19 @@ in1_max Maximum input voltage.
in1_min_alarm Input voltage low alarm.
in1_max_alarm Input voltage high alarm.
-in2_label "vout1"
-in2_input Measured output voltage.
-in2_average Average measured output voltage.
-in2_min Minimum output voltage.
-in2_min_alarm Output voltage low alarm.
-
-in3_label "vout2"
-in3_input Measured voltage on vaux pin
+in2_label "vmon"
+in2_input Measured voltage on VAUX pin
+in2_min Minimum VAUX voltage (LM25056 only).
+in2_max Maximum VAUX voltage (LM25056 only).
+in2_min_alarm VAUX voltage low alarm (LM25056 only).
+in2_max_alarm VAUX voltage high alarm (LM25056 only).
+
+in3_label "vout1"
+ Not supported on LM25056.
+in3_input Measured output voltage.
+in3_average Average measured output voltage.
+in3_min Minimum output voltage.
+in3_min_alarm Output voltage low alarm.
curr1_label "iin"
curr1_input Measured input current.
diff --git a/Documentation/hwmon/lm75 b/Documentation/hwmon/lm75
index c91a1d15fa28..69af1c7db6b7 100644
--- a/Documentation/hwmon/lm75
+++ b/Documentation/hwmon/lm75
@@ -23,7 +23,7 @@ Supported chips:
Datasheet: Publicly available at the Maxim website
http://www.maxim-ic.com/
* Microchip (TelCom) TCN75
- Prefix: 'lm75'
+ Prefix: 'tcn75'
Addresses scanned: none
Datasheet: Publicly available at the Microchip website
http://www.microchip.com/
diff --git a/Documentation/hwmon/lm95234 b/Documentation/hwmon/lm95234
new file mode 100644
index 000000000000..a0e95ddfd372
--- /dev/null
+++ b/Documentation/hwmon/lm95234
@@ -0,0 +1,36 @@
+Kernel driver lm95234
+=====================
+
+Supported chips:
+ * National Semiconductor / Texas Instruments LM95234
+ Addresses scanned: I2C 0x18, 0x4d, 0x4e
+ Datasheet: Publicly available at the Texas Instruments website
+ http://www.ti.com/product/lm95234
+
+
+Author: Guenter Roeck <linux@roeck-us.net>
+
+Description
+-----------
+
+LM95234 is an 11-bit digital temperature sensor with a 2-wire System Management
+Bus (SMBus) interface and TrueTherm technology that can very accurately monitor
+the temperature of four remote diodes as well as its own temperature.
+The four remote diodes can be external devices such as microprocessors,
+graphics processors or diode-connected 2N3904s. The LM95234's TruTherm
+beta compensation technology allows sensing of 90 nm or 65 nm process
+thermal diodes accurately.
+
+All temperature values are given in millidegrees Celsius. Temperature
+is provided within a range of -127 to +255 degrees (+127.875 degrees for
+the internal sensor). Resolution depends on temperature input and range.
+
+Each sensor has its own maximum limit, but the hysteresis is common to all
+channels. The hysteresis is configurable with the tem1_max_hyst attribute and
+affects the hysteresis on all channels. The first two external sensors also
+have a critical limit.
+
+The lm95234 driver can change its update interval to a fixed set of values.
+It will round up to the next selectable interval. See the datasheet for exact
+values. Reading sensor values more often will do no harm, but will return
+'old' values.
diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978
index c365f9beb5dd..dc0d08c61305 100644
--- a/Documentation/hwmon/ltc2978
+++ b/Documentation/hwmon/ltc2978
@@ -2,24 +2,32 @@ Kernel driver ltc2978
=====================
Supported chips:
+ * Linear Technology LTC2974
+ Prefix: 'ltc2974'
+ Addresses scanned: -
+ Datasheet: http://www.linear.com/product/ltc2974
* Linear Technology LTC2978
Prefix: 'ltc2978'
Addresses scanned: -
- Datasheet: http://cds.linear.com/docs/Datasheet/2978fa.pdf
+ Datasheet: http://www.linear.com/product/ltc2978
* Linear Technology LTC3880
Prefix: 'ltc3880'
Addresses scanned: -
- Datasheet: http://cds.linear.com/docs/Datasheet/3880f.pdf
+ Datasheet: http://www.linear.com/product/ltc3880
+ * Linear Technology LTC3883
+ Prefix: 'ltc3883'
+ Addresses scanned: -
+ Datasheet: http://www.linear.com/product/ltc3883
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
-----------
-The LTC2978 is an octal power supply monitor, supervisor, sequencer and
-margin controller. The LTC3880 is a dual, PolyPhase DC/DC synchronous
-step-down switching regulator controller.
+LTC2974 is a quad digital power supply manager. LTC2978 is an octal power supply
+monitor. LTC3880 is a dual output poly-phase step-down DC/DC controller. LTC3883
+is a single phase step-down DC/DC controller.
Usage Notes
@@ -41,63 +49,90 @@ Sysfs attributes
in1_label "vin"
in1_input Measured input voltage.
in1_min Minimum input voltage.
-in1_max Maximum input voltage.
-in1_lcrit Critical minimum input voltage.
+in1_max Maximum input voltage. LTC2974 and LTC2978 only.
+in1_lcrit Critical minimum input voltage. LTC2974 and LTC2978
+ only.
in1_crit Critical maximum input voltage.
in1_min_alarm Input voltage low alarm.
-in1_max_alarm Input voltage high alarm.
-in1_lcrit_alarm Input voltage critical low alarm.
+in1_max_alarm Input voltage high alarm. LTC2974 and LTC2978 only.
+in1_lcrit_alarm Input voltage critical low alarm. LTC2974 and LTC2978
+ only.
in1_crit_alarm Input voltage critical high alarm.
-in1_lowest Lowest input voltage. LTC2978 only.
+in1_lowest Lowest input voltage. LTC2974 and LTC2978 only.
in1_highest Highest input voltage.
-in1_reset_history Reset history. Writing into this attribute will reset
- history for all attributes.
-
-in[2-9]_label "vout[1-8]". Channels 3 to 9 on LTC2978 only.
-in[2-9]_input Measured output voltage.
-in[2-9]_min Minimum output voltage.
-in[2-9]_max Maximum output voltage.
-in[2-9]_lcrit Critical minimum output voltage.
-in[2-9]_crit Critical maximum output voltage.
-in[2-9]_min_alarm Output voltage low alarm.
-in[2-9]_max_alarm Output voltage high alarm.
-in[2-9]_lcrit_alarm Output voltage critical low alarm.
-in[2-9]_crit_alarm Output voltage critical high alarm.
-in[2-9]_lowest Lowest output voltage. LTC2978 only.
-in[2-9]_highest Lowest output voltage.
-in[2-9]_reset_history Reset history. Writing into this attribute will reset
- history for all attributes.
-
-temp[1-3]_input Measured temperature.
+in1_reset_history Reset input voltage history.
+
+in[N]_label "vout[1-8]".
+ LTC2974: N=2-5
+ LTC2978: N=2-9
+ LTC3880: N=2-3
+ LTC3883: N=2
+in[N]_input Measured output voltage.
+in[N]_min Minimum output voltage.
+in[N]_max Maximum output voltage.
+in[N]_lcrit Critical minimum output voltage.
+in[N]_crit Critical maximum output voltage.
+in[N]_min_alarm Output voltage low alarm.
+in[N]_max_alarm Output voltage high alarm.
+in[N]_lcrit_alarm Output voltage critical low alarm.
+in[N]_crit_alarm Output voltage critical high alarm.
+in[N]_lowest Lowest output voltage. LTC2974 and LTC2978 only.
+in[N]_highest Highest output voltage.
+in[N]_reset_history Reset output voltage history.
+
+temp[N]_input Measured temperature.
+ On LTC2974, temp[1-4] report external temperatures,
+ and temp5 reports the chip temperature.
On LTC2978, only one temperature measurement is
- supported and reflects the internal temperature.
+ supported and reports the chip temperature.
On LTC3880, temp1 and temp2 report external
- temperatures, and temp3 reports the internal
- temperature.
-temp[1-3]_min Mimimum temperature.
-temp[1-3]_max Maximum temperature.
-temp[1-3]_lcrit Critical low temperature.
-temp[1-3]_crit Critical high temperature.
-temp[1-3]_min_alarm Chip temperature low alarm.
-temp[1-3]_max_alarm Chip temperature high alarm.
-temp[1-3]_lcrit_alarm Chip temperature critical low alarm.
-temp[1-3]_crit_alarm Chip temperature critical high alarm.
-temp[1-3]_lowest Lowest measured temperature. LTC2978 only.
-temp[1-3]_highest Highest measured temperature.
-temp[1-3]_reset_history Reset history. Writing into this attribute will reset
- history for all attributes.
-
-power[1-2]_label "pout[1-2]". LTC3880 only.
-power[1-2]_input Measured power.
-
-curr1_label "iin". LTC3880 only.
+ temperatures, and temp3 reports the chip temperature.
+ On LTC3883, temp1 reports an external temperature,
+ and temp2 reports the chip temperature.
+temp[N]_min Mimimum temperature. LTC2974 and LTC2978 only.
+temp[N]_max Maximum temperature.
+temp[N]_lcrit Critical low temperature.
+temp[N]_crit Critical high temperature.
+temp[N]_min_alarm Temperature low alarm. LTC2974 and LTC2978 only.
+temp[N]_max_alarm Temperature high alarm.
+temp[N]_lcrit_alarm Temperature critical low alarm.
+temp[N]_crit_alarm Temperature critical high alarm.
+temp[N]_lowest Lowest measured temperature. LTC2974 and LTC2978 only.
+ Not supported for chip temperature sensor on LTC2974.
+temp[N]_highest Highest measured temperature. Not supported for chip
+ temperature sensor on LTC2974.
+temp[N]_reset_history Reset temperature history. Not supported for chip
+ temperature sensor on LTC2974.
+
+power1_label "pin". LTC3883 only.
+power1_input Measured input power.
+
+power[N]_label "pout[1-4]".
+ LTC2974: N=1-4
+ LTC2978: Not supported
+ LTC3880: N=1-2
+ LTC3883: N=2
+power[N]_input Measured output power.
+
+curr1_label "iin". LTC3880 and LTC3883 only.
curr1_input Measured input current.
curr1_max Maximum input current.
curr1_max_alarm Input current high alarm.
-
-curr[2-3]_label "iout[1-2]". LTC3880 only.
-curr[2-3]_input Measured input current.
-curr[2-3]_max Maximum input current.
-curr[2-3]_crit Critical input current.
-curr[2-3]_max_alarm Input current high alarm.
-curr[2-3]_crit_alarm Input current critical high alarm.
+curr1_highest Highest input current. LTC3883 only.
+curr1_reset_history Reset input current history. LTC3883 only.
+
+curr[N]_label "iout[1-4]".
+ LTC2974: N=1-4
+ LTC2978: not supported
+ LTC3880: N=2-3
+ LTC3883: N=2
+curr[N]_input Measured output current.
+curr[N]_max Maximum output current.
+curr[N]_crit Critical high output current.
+curr[N]_lcrit Critical low output current. LTC2974 only.
+curr[N]_max_alarm Output current high alarm.
+curr[N]_crit_alarm Output current critical high alarm.
+curr[N]_lcrit_alarm Output current critical low alarm. LTC2974 only.
+curr[N]_lowest Lowest output current. LTC2974 only.
+curr[N]_highest Highest output current.
+curr[N]_reset_history Reset output current history.
diff --git a/Documentation/hwmon/ltc4261 b/Documentation/hwmon/ltc4261
index eba2e2c4b94d..9378a75c6134 100644
--- a/Documentation/hwmon/ltc4261
+++ b/Documentation/hwmon/ltc4261
@@ -8,7 +8,7 @@ Supported chips:
Datasheet:
http://cds.linear.com/docs/Datasheet/42612fb.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/max16064 b/Documentation/hwmon/max16064
index f8b478076f6d..d59cc7829bec 100644
--- a/Documentation/hwmon/max16064
+++ b/Documentation/hwmon/max16064
@@ -7,7 +7,7 @@ Supported chips:
Addresses scanned: -
Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX16064.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/max16065 b/Documentation/hwmon/max16065
index c11f64a1f2ad..208a29e43010 100644
--- a/Documentation/hwmon/max16065
+++ b/Documentation/hwmon/max16065
@@ -24,7 +24,7 @@ Supported chips:
http://datasheets.maxim-ic.com/en/ds/MAX16070-MAX16071.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/max34440 b/Documentation/hwmon/max34440
index 47651ff341ae..37cbf472a19d 100644
--- a/Documentation/hwmon/max34440
+++ b/Documentation/hwmon/max34440
@@ -27,7 +27,7 @@ Supported chips:
Addresses scanned: -
Datasheet: http://datasheets.maximintegrated.com/en/ds/MAX34461.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/max8688 b/Documentation/hwmon/max8688
index fe849871df32..e78078638b91 100644
--- a/Documentation/hwmon/max8688
+++ b/Documentation/hwmon/max8688
@@ -7,7 +7,7 @@ Supported chips:
Addresses scanned: -
Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX8688.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/nct6775 b/Documentation/hwmon/nct6775
new file mode 100644
index 000000000000..4e9ef60e8c6c
--- /dev/null
+++ b/Documentation/hwmon/nct6775
@@ -0,0 +1,188 @@
+Note
+====
+
+This driver supersedes the NCT6775F and NCT6776F support in the W83627EHF
+driver.
+
+Kernel driver NCT6775
+=====================
+
+Supported chips:
+ * Nuvoton NCT5572D/NCT6771F/NCT6772F/NCT6775F/W83677HG-I
+ Prefix: 'nct6775'
+ Addresses scanned: ISA address retrieved from Super I/O registers
+ Datasheet: Available from Nuvoton upon request
+ * Nuvoton NCT5577D/NCT6776D/NCT6776F
+ Prefix: 'nct6776'
+ Addresses scanned: ISA address retrieved from Super I/O registers
+ Datasheet: Available from Nuvoton upon request
+ * Nuvoton NCT5532D/NCT6779D
+ Prefix: 'nct6779'
+ Addresses scanned: ISA address retrieved from Super I/O registers
+ Datasheet: Available from Nuvoton upon request
+
+Authors:
+ Guenter Roeck <linux@roeck-us.net>
+
+Description
+-----------
+
+This driver implements support for the Nuvoton NCT6775F, NCT6776F, and NCT6779D
+and compatible super I/O chips.
+
+The chips support up to 25 temperature monitoring sources. Up to 6 of those are
+direct temperature sensor inputs, the others are special sources such as PECI,
+PCH, and SMBUS. Depending on the chip type, 2 to 6 of the temperature sources
+can be monitored and compared against minimum, maximum, and critical
+temperatures. The driver reports up to 10 of the temperatures to the user.
+There are 4 to 5 fan rotation speed sensors, 8 to 15 analog voltage sensors,
+one VID, alarms with beep warnings (control unimplemented), and some automatic
+fan regulation strategies (plus manual fan control mode).
+
+The temperature sensor sources on all chips are configurable. The configured
+source for each of the temperature sensors is provided in tempX_label.
+
+Temperatures are measured in degrees Celsius and measurement resolution is
+either 1 degC or 0.5 degC, depending on the temperature source and
+configuration. An alarm is triggered when the temperature gets higher than
+the high limit; it stays on until the temperature falls below the hysteresis
+value. Alarms are only supported for temp1 to temp6, depending on the chip type.
+
+Fan rotation speeds are reported in RPM (rotations per minute). An alarm is
+triggered if the rotation speed has dropped below a programmable limit. On
+NCT6775F, fan readings can be divided by a programmable divider (1, 2, 4, 8,
+16, 32, 64 or 128) to give the readings more range or accuracy; the other chips
+do not have a fan speed divider. The driver sets the most suitable fan divisor
+itself; specifically, it increases the divider value each time a fan speed
+reading returns an invalid value, and it reduces it if the fan speed reading
+is lower than optimal. Some fans might not be present because they share pins
+with other functions.
+
+Voltage sensors (also known as IN sensors) report their values in millivolts.
+An alarm is triggered if the voltage has crossed a programmable minimum
+or maximum limit.
+
+The driver supports automatic fan control mode known as Thermal Cruise.
+In this mode, the chip attempts to keep the measured temperature in a
+predefined temperature range. If the temperature goes out of range, fan
+is driven slower/faster to reach the predefined range again.
+
+The mode works for fan1-fan5.
+
+sysfs attributes
+----------------
+
+pwm[1-5] - this file stores PWM duty cycle or DC value (fan speed) in range:
+ 0 (lowest speed) to 255 (full)
+
+pwm[1-5]_enable - this file controls mode of fan/temperature control:
+ * 0 Fan control disabled (fans set to maximum speed)
+ * 1 Manual mode, write to pwm[0-5] any value 0-255
+ * 2 "Thermal Cruise" mode
+ * 3 "Fan Speed Cruise" mode
+ * 4 "Smart Fan III" mode (NCT6775F only)
+ * 5 "Smart Fan IV" mode
+
+pwm[1-5]_mode - controls if output is PWM or DC level
+ * 0 DC output
+ * 1 PWM output
+
+Common fan control attributes
+-----------------------------
+
+pwm[1-5]_temp_sel Temperature source. Value is temperature sensor index.
+ For example, select '1' for temp1_input.
+pwm[1-5]_weight_temp_sel
+ Secondary temperature source. Value is temperature
+ sensor index. For example, select '1' for temp1_input.
+ Set to 0 to disable secondary temperature control.
+
+If secondary temperature functionality is enabled, it is controlled with the
+following attributes.
+
+pwm[1-5]_weight_duty_step
+ Duty step size.
+pwm[1-5]_weight_temp_step
+ Temperature step size. With each step over
+ temp_step_base, the value of weight_duty_step is added
+ to the current pwm value.
+pwm[1-5]_weight_temp_step_base
+ Temperature at which secondary temperature control kicks
+ in.
+pwm[1-5]_weight_temp_step_tol
+ Temperature step tolerance.
+
+Thermal Cruise mode (2)
+-----------------------
+
+If the temperature is in the range defined by:
+
+pwm[1-5]_target_temp Target temperature, unit millidegree Celsius
+ (range 0 - 127000)
+pwm[1-5]_temp_tolerance
+ Target temperature tolerance, unit millidegree Celsius
+
+there are no changes to fan speed. Once the temperature leaves the interval, fan
+speed increases (if temperature is higher that desired) or decreases (if
+temperature is lower than desired), using the following limits and time
+intervals.
+
+pwm[1-5]_start fan pwm start value (range 1 - 255), to start fan
+ when the temperature is above defined range.
+pwm[1-5]_floor lowest fan pwm (range 0 - 255) if temperature is below
+ the defined range. If set to 0, the fan is expected to
+ stop if the temperature is below the defined range.
+pwm[1-5]_step_up_time milliseconds before fan speed is increased
+pwm[1-5]_step_down_time milliseconds before fan speed is decreased
+pwm[1-5]_stop_time how many milliseconds must elapse to switch
+ corresponding fan off (when the temperature was below
+ defined range).
+
+Speed Cruise mode (3)
+---------------------
+
+This modes tries to keep the fan speed constant.
+
+fan[1-5]_target Target fan speed
+fan[1-5]_tolerance
+ Target speed tolerance
+
+
+Untested; use at your own risk.
+
+Smart Fan IV mode (5)
+---------------------
+
+This mode offers multiple slopes to control the fan speed. The slopes can be
+controlled by setting the pwm and temperature attributes. When the temperature
+rises, the chip will calculate the DC/PWM output based on the current slope.
+There are up to seven data points depending on the chip type. Subsequent data
+points should be set to higher temperatures and higher pwm values to achieve
+higher fan speeds with increasing temperature. The last data point reflects
+critical temperature mode, in which the fans should run at full speed.
+
+pwm[1-5]_auto_point[1-7]_pwm
+ pwm value to be set if temperature reaches matching
+ temperature range.
+pwm[1-5]_auto_point[1-7]_temp
+ Temperature over which the matching pwm is enabled.
+pwm[1-5]_temp_tolerance
+ Temperature tolerance, unit millidegree Celsius
+pwm[1-5]_crit_temp_tolerance
+ Temperature tolerance for critical temperature,
+ unit millidegree Celsius
+
+pwm[1-5]_step_up_time milliseconds before fan speed is increased
+pwm[1-5]_step_down_time milliseconds before fan speed is decreased
+
+Usage Notes
+-----------
+
+On various ASUS boards with NCT6776F, it appears that CPUTIN is not really
+connected to anything and floats, or that it is connected to some non-standard
+temperature measurement device. As a result, the temperature reported on CPUTIN
+will not reflect a usable value. It often reports unreasonably high
+temperatures, and in some cases the reported temperature declines if the actual
+temperature increases (similar to the raw PECI temperature value - see PECI
+specification for details). CPUTIN should therefore be be ignored on ASUS
+boards. The CPU temperature on ASUS boards is reported from PECI 0.
diff --git a/Documentation/hwmon/pmbus b/Documentation/hwmon/pmbus
index 3d3a0f97f966..cf756ed48ff9 100644
--- a/Documentation/hwmon/pmbus
+++ b/Documentation/hwmon/pmbus
@@ -34,7 +34,7 @@ Supported chips:
Addresses scanned: -
Datasheet: n.a.
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/sht15 b/Documentation/hwmon/sht15
index 02850bdfac18..778987d1856f 100644
--- a/Documentation/hwmon/sht15
+++ b/Documentation/hwmon/sht15
@@ -40,7 +40,7 @@ bits for humidity, or 12 bits for temperature and 8 bits for humidity.
The humidity calibration coefficients are programmed into an OTP memory on the
chip. These coefficients are used to internally calibrate the signals from the
sensors. Disabling the reload of those coefficients allows saving 10ms for each
-measurement and decrease power consumption, while loosing on precision.
+measurement and decrease power consumption, while losing on precision.
Some options may be set directly in the sht15_platform_data structure
or via sysfs attributes.
diff --git a/Documentation/hwmon/smm665 b/Documentation/hwmon/smm665
index 59e316140542..a341eeedab75 100644
--- a/Documentation/hwmon/smm665
+++ b/Documentation/hwmon/smm665
@@ -29,7 +29,7 @@ Supported chips:
http://www.summitmicro.com/prod_select/summary/SMM766/SMM766_2086.pdf
http://www.summitmicro.com/prod_select/summary/SMM766B/SMM766B_2122.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Module Parameters
diff --git a/Documentation/hwmon/tmp401 b/Documentation/hwmon/tmp401
index 9fc447249212..f91e3fa7e5ec 100644
--- a/Documentation/hwmon/tmp401
+++ b/Documentation/hwmon/tmp401
@@ -8,8 +8,16 @@ Supported chips:
Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp401.html
* Texas Instruments TMP411
Prefix: 'tmp411'
- Addresses scanned: I2C 0x4c
+ Addresses scanned: I2C 0x4c, 0x4d, 0x4e
Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp411.html
+ * Texas Instruments TMP431
+ Prefix: 'tmp431'
+ Addresses scanned: I2C 0x4c, 0x4d
+ Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp431.html
+ * Texas Instruments TMP432
+ Prefix: 'tmp432'
+ Addresses scanned: I2C 0x4c, 0x4d
+ Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp432.html
Authors:
Hans de Goede <hdegoede@redhat.com>
@@ -18,19 +26,19 @@ Authors:
Description
-----------
-This driver implements support for Texas Instruments TMP401 and
-TMP411 chips. These chips implements one remote and one local
-temperature sensor. Temperature is measured in degrees
+This driver implements support for Texas Instruments TMP401, TMP411,
+TMP431, and TMP432 chips. These chips implement one or two remote and
+one local temperature sensors. Temperature is measured in degrees
Celsius. Resolution of the remote sensor is 0.0625 degree. Local
sensor resolution can be set to 0.5, 0.25, 0.125 or 0.0625 degree (not
supported by the driver so far, so using the default resolution of 0.5
degree).
The driver provides the common sysfs-interface for temperatures (see
-/Documentation/hwmon/sysfs-interface under Temperatures).
+Documentation/hwmon/sysfs-interface under Temperatures).
-The TMP411 chip is compatible with TMP401. It provides some additional
-features.
+The TMP411 and TMP431 chips are compatible with TMP401. TMP411 provides
+some additional features.
* Minimum and Maximum temperature measured since power-on, chip-reset
@@ -40,3 +48,6 @@ features.
Exported via sysfs attribute temp_reset_history. Writing 1 to this
file triggers a reset.
+
+TMP432 is compatible with TMP401 and TMP431. It supports two external
+temperature sensors.
diff --git a/Documentation/hwmon/ucd9000 b/Documentation/hwmon/ucd9000
index 0df5f276505b..805e33edb978 100644
--- a/Documentation/hwmon/ucd9000
+++ b/Documentation/hwmon/ucd9000
@@ -11,7 +11,7 @@ Supported chips:
http://focus.ti.com/lit/ds/symlink/ucd9090.pdf
http://focus.ti.com/lit/ds/symlink/ucd90910.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/ucd9200 b/Documentation/hwmon/ucd9200
index fd7d07b1908a..1e8060e631bd 100644
--- a/Documentation/hwmon/ucd9200
+++ b/Documentation/hwmon/ucd9200
@@ -15,7 +15,7 @@ Supported chips:
http://focus.ti.com/lit/ds/symlink/ucd9246.pdf
http://focus.ti.com/lit/ds/symlink/ucd9248.pdf
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
diff --git a/Documentation/hwmon/zl6100 b/Documentation/hwmon/zl6100
index 3d924b6b59e9..33908a4d68ff 100644
--- a/Documentation/hwmon/zl6100
+++ b/Documentation/hwmon/zl6100
@@ -54,7 +54,7 @@ http://archive.ericsson.net/service/internet/picov/get?DocNo=28701-EN/LZT146401
http://archive.ericsson.net/service/internet/picov/get?DocNo=28701-EN/LZT146256
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
@@ -125,7 +125,7 @@ in2_label "vmon"
in2_input Measured voltage on VMON (ZL2004) or VDRV (ZL9101M,
ZL9117M) pin. Reported voltage is 16x the voltage on the
pin (adjusted internally by the chip).
-in2_lcrit Critical minumum VMON/VDRV Voltage.
+in2_lcrit Critical minimum VMON/VDRV Voltage.
in2_crit Critical maximum VMON/VDRV voltage.
in2_lcrit_alarm VMON/VDRV voltage critical low alarm.
in2_crit_alarm VMON/VDRV voltage critical high alarm.
diff --git a/Documentation/i2c/busses/i2c-diolan-u2c b/Documentation/i2c/busses/i2c-diolan-u2c
index 30fe4bb9a069..0d6018c316c7 100644
--- a/Documentation/i2c/busses/i2c-diolan-u2c
+++ b/Documentation/i2c/busses/i2c-diolan-u2c
@@ -5,7 +5,7 @@ Supported adapters:
Documentation:
http://www.diolan.com/i2c/u2c12.html
-Author: Guenter Roeck <guenter.roeck@ericsson.com>
+Author: Guenter Roeck <linux@roeck-us.net>
Description
-----------
diff --git a/Documentation/ia64/err_inject.txt b/Documentation/ia64/err_inject.txt
index 223e4f0582d0..9f651c181429 100644
--- a/Documentation/ia64/err_inject.txt
+++ b/Documentation/ia64/err_inject.txt
@@ -882,7 +882,7 @@ int err_inj()
cpu=parameters[i].cpu;
k = cpu%64;
j = cpu/64;
- mask[j]=1<<k;
+ mask[j] = 1UL << k;
if (sched_setaffinity(0, MASK_SIZE*8, mask)==-1) {
perror("Error sched_setaffinity:");
diff --git a/Documentation/input/alps.txt b/Documentation/input/alps.txt
index 3262b6e4d686..e544c7ff8cfa 100644
--- a/Documentation/input/alps.txt
+++ b/Documentation/input/alps.txt
@@ -3,10 +3,26 @@ ALPS Touchpad Protocol
Introduction
------------
-
-Currently the ALPS touchpad driver supports four protocol versions in use by
-ALPS touchpads, called versions 1, 2, 3, and 4. Information about the various
-protocol versions is contained in the following sections.
+Currently the ALPS touchpad driver supports five protocol versions in use by
+ALPS touchpads, called versions 1, 2, 3, 4 and 5.
+
+Since roughly mid-2010 several new ALPS touchpads have been released and
+integrated into a variety of laptops and netbooks. These new touchpads
+have enough behavior differences that the alps_model_data definition
+table, describing the properties of the different versions, is no longer
+adequate. The design choices were to re-define the alps_model_data
+table, with the risk of regression testing existing devices, or isolate
+the new devices outside of the alps_model_data table. The latter design
+choice was made. The new touchpad signatures are named: "Rushmore",
+"Pinnacle", and "Dolphin", which you will see in the alps.c code.
+For the purposes of this document, this group of ALPS touchpads will
+generically be called "new ALPS touchpads".
+
+We experimented with probing the ACPI interface _HID (Hardware ID)/_CID
+(Compatibility ID) definition as a way to uniquely identify the
+different ALPS variants but there did not appear to be a 1:1 mapping.
+In fact, it appeared to be an m:n mapping between the _HID and actual
+hardware type.
Detection
---------
@@ -20,9 +36,13 @@ If the E6 report is successful, the touchpad model is identified using the "E7
report" sequence: E8-E7-E7-E7-E9. The response is the model signature and is
matched against known models in the alps_model_data_array.
-With protocol versions 3 and 4, the E7 report model signature is always
-73-02-64. To differentiate between these versions, the response from the
-"Enter Command Mode" sequence must be inspected as described below.
+For older touchpads supporting protocol versions 3 and 4, the E7 report
+model signature is always 73-02-64. To differentiate between these
+versions, the response from the "Enter Command Mode" sequence must be
+inspected as described below.
+
+The new ALPS touchpads have an E7 signature of 73-03-50 or 73-03-0A but
+seem to be better differentiated by the EC Command Mode response.
Command Mode
------------
@@ -47,6 +67,14 @@ address of the register being read, and the third contains the value of the
register. Registers are written by writing the value one nibble at a time
using the same encoding used for addresses.
+For the new ALPS touchpads, the EC command is used to enter command
+mode. The response in the new ALPS touchpads is significantly different,
+and more important in determining the behavior. This code has been
+separated from the original alps_model_data table and put in the
+alps_identify function. For example, there seem to be two hardware init
+sequences for the "Dolphin" touchpads as determined by the second byte
+of the EC response.
+
Packet Format
-------------
@@ -187,3 +215,28 @@ There are several things worth noting here.
well.
So far no v4 devices with tracksticks have been encountered.
+
+ALPS Absolute Mode - Protocol Version 5
+---------------------------------------
+This is basically Protocol Version 3 but with different logic for packet
+decode. It uses the same alps_process_touchpad_packet_v3 call with a
+specialized decode_fields function pointer to correctly interpret the
+packets. This appears to only be used by the Dolphin devices.
+
+For single-touch, the 6-byte packet format is:
+
+ byte 0: 1 1 0 0 1 0 0 0
+ byte 1: 0 x6 x5 x4 x3 x2 x1 x0
+ byte 2: 0 y6 y5 y4 y3 y2 y1 y0
+ byte 3: 0 M R L 1 m r l
+ byte 4: y10 y9 y8 y7 x10 x9 x8 x7
+ byte 5: 0 z6 z5 z4 z3 z2 z1 z0
+
+For mt, the format is:
+
+ byte 0: 1 1 1 n3 1 n2 n1 x24
+ byte 1: 1 y7 y6 y5 y4 y3 y2 y1
+ byte 2: ? x2 x1 y12 y11 y10 y9 y8
+ byte 3: 0 x23 x22 x21 x20 x19 x18 x17
+ byte 4: 0 x9 x8 x7 x6 x5 x4 x3
+ byte 5: 0 x16 x15 x14 x13 x12 x11 x10
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 3210540f8bd3..237acab169dd 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -131,6 +131,7 @@ Code Seq#(hex) Include File Comments
'H' 40-4F sound/hdspm.h conflict!
'H' 40-4F sound/hdsp.h conflict!
'H' 90 sound/usb/usx2y/usb_stream.h
+'H' A0 uapi/linux/usb/cdc-wdm.h
'H' C0-F0 net/bluetooth/hci.h conflict!
'H' C0-DF net/bluetooth/hidp/hidp.h conflict!
'H' C0-DF net/bluetooth/cmtp/cmtp.h conflict!
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 13f1aa09b938..9c7fd988e299 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -297,6 +297,7 @@ Boot into System Kernel
On ia64, 256M@256M is a generous value that typically works.
The region may be automatically placed on ia64, see the
dump-capture kernel config option notes above.
+ If use sparse memory, the size should be rounded to GRANULE boundaries.
On s390x, typically use "crashkernel=xxM". The value of xx is dependent
on the memory consumption of the kdump system. In general this is not
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e567af39ee34..de12397b60a9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -44,6 +44,7 @@ parameter is applicable:
AVR32 AVR32 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
BLACKFIN Blackfin architecture is enabled.
+ CLK Common clock infrastructure is enabled.
DRM Direct Rendering Management support is enabled.
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
@@ -320,6 +321,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
on: enable for both 32- and 64-bit processes
off: disable for both 32- and 64-bit processes
+ alloc_snapshot [FTRACE]
+ Allocate the ftrace snapshot buffer on boot up when the
+ main buffer is allocated. This is handy if debugging
+ and you need to use tracing_snapshot() on boot up, and
+ do not want to use tracing_snapshot_alloc() as it needs
+ to be done where GFP_KERNEL allocations are allowed.
+
amd_iommu= [HW,X86-64]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
@@ -465,6 +473,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
cio_ignore= [S390]
See Documentation/s390/CommonIO for details.
+ clk_ignore_unused
+ [CLK]
+ Keep all clocks already enabled by bootloader on,
+ even if no driver has claimed them. This is useful
+ for debug and development, but should not be
+ needed on a platform with proper driver support.
+ For more information, see Documentation/clk.txt.
clock= [BUGS=X86-32, HW] gettimeofday clocksource override.
[Deprecated]
@@ -596,9 +611,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
is selected automatically. Check
Documentation/kdump/kdump.txt for further details.
- crashkernel_low=size[KMG]
- [KNL, x86] parts under 4G.
-
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
@@ -606,6 +618,26 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.
+ crashkernel=size[KMG],high
+ [KNL, x86_64] range could be above 4G. Allow kernel
+ to allocate physical memory region from top, so could
+ be above 4G if system have more than 4G ram installed.
+ Otherwise memory region will be allocated below 4G, if
+ available.
+ It will be ignored if crashkernel=X is specified.
+ crashkernel=size[KMG],low
+ [KNL, x86_64] range under 4G. When crashkernel=X,high
+ is passed, kernel could allocate physical memory region
+ above 4G, that cause second kernel crash on system
+ that require some amount of low memory, e.g. swiotlb
+ requires at least 64M+32K low memory. Kernel would
+ try to allocate 72M below 4G automatically.
+ This one let user to specify own low range under 4G
+ for second kernel instead.
+ 0: to disable low allocation.
+ It will be ignored when crashkernel=X,high is not used
+ or memory reserved is below 4G.
+
cs89x0_dma= [HW,NET]
Format: <dma>
@@ -788,6 +820,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
edd= [EDD]
Format: {"off" | "on" | "skip[mbr]"}
+ efi_no_storage_paranoia [EFI; X86]
+ Using this parameter you can use more than 50% of
+ your efi variable storage. Use this parameter only if
+ you are really sure that your UEFI does sane gc and
+ fulfills the spec otherwise your board may brick.
+
eisa_irq_edge= [PARISC,HW]
See header of drivers/parisc/eisa.c.
@@ -978,6 +1016,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
If specified, z/VM IUCV HVC accepts connections
from listed z/VM user IDs only.
+ hwthread_map= [METAG] Comma-separated list of Linux cpu id to
+ hardware thread id mappings.
+ Format: <cpu>:<hwthread>
+
keep_bootcon [KNL]
Do not unregister boot console at start. This is only
useful for debugging when something happens in the window
@@ -1645,42 +1687,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
- movablemem_map=acpi
- [KNL,X86,IA-64,PPC] This parameter is similar to
- memmap except it specifies the memory map of
- ZONE_MOVABLE.
- This option inform the kernel to use Hot Pluggable bit
- in flags from SRAT from ACPI BIOS to determine which
- memory devices could be hotplugged. The corresponding
- memory ranges will be set as ZONE_MOVABLE.
- NOTE: Whatever node the kernel resides in will always
- be un-hotpluggable.
-
- movablemem_map=nn[KMG]@ss[KMG]
- [KNL,X86,IA-64,PPC] This parameter is similar to
- memmap except it specifies the memory map of
- ZONE_MOVABLE.
- If user specifies memory ranges, the info in SRAT will
- be ingored. And it works like the following:
- - If more ranges are all within one node, then from
- lowest ss to the end of the node will be ZONE_MOVABLE.
- - If a range is within a node, then from ss to the end
- of the node will be ZONE_MOVABLE.
- - If a range covers two or more nodes, then from ss to
- the end of the 1st node will be ZONE_MOVABLE, and all
- the rest nodes will only have ZONE_MOVABLE.
- If memmap is specified at the same time, the
- movablemem_map will be limited within the memmap
- areas. If kernelcore or movablecore is also specified,
- movablemem_map will have higher priority to be
- satisfied. So the administrator should be careful that
- the amount of movablemem_map areas are not too large.
- Otherwise kernel won't have enough memory to start.
- NOTE: We don't stop users specifying the node the
- kernel resides in as hotpluggable so that this
- option can be used as a workaround of firmware
- bugs.
-
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
@@ -2493,9 +2499,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
Invocation of these CPUs' RCU callbacks will
- be offloaded to "rcuoN" kthreads created for
- that purpose. This reduces OS jitter on the
+ be offloaded to "rcuox/N" kthreads created for
+ that purpose, where "x" is "b" for RCU-bh, "p"
+ for RCU-preempt, and "s" for RCU-sched, and "N"
+ is the CPU number. This reduces OS jitter on the
offloaded CPUs, which can be useful for HPC and
+
real-time workloads. It can also improve energy
efficiency for asymmetric multiprocessors.
@@ -2519,6 +2528,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
leaf rcu_node structure. Useful for very large
systems.
+ rcutree.jiffies_till_first_fqs= [KNL,BOOT]
+ Set delay from grace-period initialization to
+ first attempt to force quiescent states.
+ Units are jiffies, minimum value is zero,
+ and maximum value is HZ.
+
+ rcutree.jiffies_till_next_fqs= [KNL,BOOT]
+ Set delay between subsequent attempts to force
+ quiescent states. Units are jiffies, minimum
+ value is one, and maximum value is HZ.
+
rcutree.qhimark= [KNL,BOOT]
Set threshold of queued
RCU callbacks over which batch limiting is disabled.
@@ -2533,16 +2553,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
rcutree.rcu_cpu_stall_timeout= [KNL,BOOT]
Set timeout for RCU CPU stall warning messages.
- rcutree.jiffies_till_first_fqs= [KNL,BOOT]
- Set delay from grace-period initialization to
- first attempt to force quiescent states.
- Units are jiffies, minimum value is zero,
- and maximum value is HZ.
+ rcutree.rcu_idle_gp_delay= [KNL,BOOT]
+ Set wakeup interval for idle CPUs that have
+ RCU callbacks (RCU_FAST_NO_HZ=y).
- rcutree.jiffies_till_next_fqs= [KNL,BOOT]
- Set delay between subsequent attempts to force
- quiescent states. Units are jiffies, minimum
- value is one, and maximum value is HZ.
+ rcutree.rcu_idle_lazy_gp_delay= [KNL,BOOT]
+ Set wakeup interval for idle CPUs that have
+ only "lazy" RCU callbacks (RCU_FAST_NO_HZ=y).
+ Lazy RCU callbacks are those which RCU can
+ prove do nothing more than free memory.
rcutorture.fqs_duration= [KNL,BOOT]
Set duration of force_quiescent_state bursts.
@@ -3254,6 +3273,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
or other driver-specific files in the
Documentation/watchdog/ directory.
+ workqueue.disable_numa
+ By default, all work items queued to unbound
+ workqueues are affine to the NUMA nodes they're
+ issued on, which results in better behavior in
+ general. If NUMA affinity needs to be disabled for
+ whatever reason, this option can be used. Note
+ that this also can be controlled per-workqueue for
+ workqueues visible under /sys/bus/workqueue/.
+
x2apic_phys [X86-64,APIC] Use x2apic physical mode instead of
default x2apic cluster mode on platforms
supporting x2apic.
diff --git a/Documentation/metag/00-INDEX b/Documentation/metag/00-INDEX
new file mode 100644
index 000000000000..db11c513bd5c
--- /dev/null
+++ b/Documentation/metag/00-INDEX
@@ -0,0 +1,4 @@
+00-INDEX
+ - this file
+kernel-ABI.txt
+ - Documents metag ABI details
diff --git a/Documentation/metag/kernel-ABI.txt b/Documentation/metag/kernel-ABI.txt
new file mode 100644
index 000000000000..7b8dee83b9c1
--- /dev/null
+++ b/Documentation/metag/kernel-ABI.txt
@@ -0,0 +1,256 @@
+ ==========================
+ KERNEL ABIS FOR METAG ARCH
+ ==========================
+
+This document describes the Linux ABIs for the metag architecture, and has the
+following sections:
+
+ (*) Outline of registers
+ (*) Userland registers
+ (*) Kernel registers
+ (*) System call ABI
+ (*) Calling conventions
+
+
+====================
+OUTLINE OF REGISTERS
+====================
+
+The main Meta core registers are arranged in units:
+
+ UNIT Type DESCRIPTION GP EXT PRIV GLOBAL
+ ======= ======= =============== ======= ======= ======= =======
+ CT Special Control unit
+ D0 General Data unit 0 0-7 8-15 16-31 16-31
+ D1 General Data unit 1 0-7 8-15 16-31 16-31
+ A0 General Address unit 0 0-3 4-7 8-15 8-15
+ A1 General Address unit 1 0-3 4-7 8-15 8-15
+ PC Special PC unit 0 1
+ PORT Special Ports
+ TR Special Trigger unit 0-7
+ TT Special Trace unit 0-5
+ FX General FP unit 0-15
+
+GP registers form part of the main context.
+
+Extended context registers (EXT) may not be present on all hardware threads and
+can be context switched if support is enabled and the appropriate bits are set
+in e.g. the D0.8 register to indicate what extended state to preserve.
+
+Global registers are shared between threads and are privilege protected.
+
+See arch/metag/include/asm/metag_regs.h for definitions relating to core
+registers and the fields and bits they contain. See the TRMs for further details
+about special registers.
+
+Several special registers are preserved in the main context, these are the
+interesting ones:
+
+ REG (ALIAS) PURPOSE
+ ======================= ===============================================
+ CT.1 (TXMODE) Processor mode bits (particularly for DSP)
+ CT.2 (TXSTATUS) Condition flags and LSM_STEP (MGET/MSET step)
+ CT.3 (TXRPT) Branch repeat counter
+ PC.0 (PC) Program counter
+
+Some of the general registers have special purposes in the ABI and therefore
+have aliases:
+
+ D0 REG (ALIAS) PURPOSE D1 REG (ALIAS) PURPOSE
+ =============== =============== =============== =======================
+ D0.0 (D0Re0) 32bit result D1.0 (D1Re0) Top half of 64bit result
+ D0.1 (D0Ar6) Argument 6 D1.1 (D1Ar5) Argument 5
+ D0.2 (D0Ar4) Argument 4 D1.2 (D1Ar3) Argument 3
+ D0.3 (D0Ar2) Argument 2 D1.3 (D1Ar1) Argument 1
+ D0.4 (D0FrT) Frame temp D1.4 (D1RtP) Return pointer
+ D0.5 Call preserved D1.5 Call preserved
+ D0.6 Call preserved D1.6 Call preserved
+ D0.7 Call preserved D1.7 Call preserved
+
+ A0 REG (ALIAS) PURPOSE A1 REG (ALIAS) PURPOSE
+ =============== =============== =============== =======================
+ A0.0 (A0StP) Stack pointer A1.0 (A1GbP) Global base pointer
+ A0.1 (A0FrP) Frame pointer A1.1 (A1LbP) Local base pointer
+ A0.2 A1.2
+ A0.3 A1.3
+
+
+==================
+USERLAND REGISTERS
+==================
+
+All the general purpose D0, D1, A0, A1 registers are preserved when entering the
+kernel (including asynchronous events such as interrupts and timer ticks) except
+the following which have special purposes in the ABI:
+
+ REGISTERS WHEN STATUS PURPOSE
+ =============== ======= =============== ===============================
+ D0.8 DSP Preserved ECH, determines what extended
+ DSP state to preserve.
+ A0.0 (A0StP) ALWAYS Preserved Stack >= A0StP may be clobbered
+ at any time by the creation of a
+ signal frame.
+ A1.0 (A1GbP) SMP Clobbered Used as temporary for loading
+ kernel stack pointer and saving
+ core context.
+ A0.15 !SMP Protected Stores kernel stack pointer.
+ A1.15 ALWAYS Protected Stores kernel base pointer.
+
+On UP A0.15 is used to store the kernel stack pointer for storing the userland
+context. A0.15 is global between hardware threads though which means it cannot
+be used on SMP for this purpose. Since no protected local registers are
+available A1GbP is reserved for use as a temporary to allow a percpu stack
+pointer to be loaded for storing the rest of the context.
+
+
+================
+KERNEL REGISTERS
+================
+
+When in the kernel the following registers have special purposes in the ABI:
+
+ REGISTERS WHEN STATUS PURPOSE
+ =============== ======= =============== ===============================
+ A0.0 (A0StP) ALWAYS Preserved Stack >= A0StP may be clobbered
+ at any time by the creation of
+ an irq signal frame.
+ A1.0 (A1GbP) ALWAYS Preserved Reserved (kernel base pointer).
+
+
+===============
+SYSTEM CALL ABI
+===============
+
+When a system call is made, the following registers are effective:
+
+ REGISTERS CALL RETURN
+ =============== ======================= ===============================
+ D0.0 (D0Re0) Return value (or -errno)
+ D1.0 (D1Re0) System call number Clobbered
+ D0.1 (D0Ar6) Syscall arg #6 Preserved
+ D1.1 (D1Ar5) Syscall arg #5 Preserved
+ D0.2 (D0Ar4) Syscall arg #4 Preserved
+ D1.2 (D1Ar3) Syscall arg #3 Preserved
+ D0.3 (D0Ar2) Syscall arg #2 Preserved
+ D1.3 (D1Ar1) Syscall arg #1 Preserved
+
+Due to the limited number of argument registers and some system calls with badly
+aligned 64-bit arguments, 64-bit values are always packed in consecutive
+arguments, even if this is contrary to the normal calling conventions (where the
+two halves would go in a matching pair of data registers).
+
+For example fadvise64_64 usually has the signature:
+
+ long sys_fadvise64_64(i32 fd, i64 offs, i64 len, i32 advice);
+
+But for metag fadvise64_64 is wrapped so that the 64-bit arguments are packed:
+
+ long sys_fadvise64_64_metag(i32 fd, i32 offs_lo,
+ i32 offs_hi, i32 len_lo,
+ i32 len_hi, i32 advice)
+
+So the arguments are packed in the registers like this:
+
+ D0 REG (ALIAS) VALUE D1 REG (ALIAS) VALUE
+ =============== =============== =============== =======================
+ D0.1 (D0Ar6) advice D1.1 (D1Ar5) hi(len)
+ D0.2 (D0Ar4) lo(len) D1.2 (D1Ar3) hi(offs)
+ D0.3 (D0Ar2) lo(offs) D1.3 (D1Ar1) fd
+
+
+===================
+CALLING CONVENTIONS
+===================
+
+These calling conventions apply to both user and kernel code. The stack grows
+from low addresses to high addresses in the metag ABI. The stack pointer (A0StP)
+should always point to the next free address on the stack and should at all
+times be 64-bit aligned. The following registers are effective at the point of a
+call:
+
+ REGISTERS CALL RETURN
+ =============== ======================= ===============================
+ D0.0 (D0Re0) 32bit return value
+ D1.0 (D1Re0) Upper half of 64bit return value
+ D0.1 (D0Ar6) 32bit argument #6 Clobbered
+ D1.1 (D1Ar5) 32bit argument #5 Clobbered
+ D0.2 (D0Ar4) 32bit argument #4 Clobbered
+ D1.2 (D1Ar3) 32bit argument #3 Clobbered
+ D0.3 (D0Ar2) 32bit argument #2 Clobbered
+ D1.3 (D1Ar1) 32bit argument #1 Clobbered
+ D0.4 (D0FrT) Clobbered
+ D1.4 (D1RtP) Return pointer Clobbered
+ D{0-1}.{5-7} Preserved
+ A0.0 (A0StP) Stack pointer Preserved
+ A1.0 (A0GbP) Preserved
+ A0.1 (A0FrP) Frame pointer Preserved
+ A1.1 (A0LbP) Preserved
+ A{0-1},{2-3} Clobbered
+
+64-bit arguments are placed in matching pairs of registers (i.e. the same
+register number in both D0 and D1 units), with the least significant half in D0
+and the most significant half in D1, leaving a gap where necessary. Futher
+arguments are stored on the stack in reverse order (earlier arguments at higher
+addresses):
+
+ ADDRESS 0 1 2 3 4 5 6 7
+ =============== ===== ===== ===== ===== ===== ===== ===== =====
+ A0StP -->
+ A0StP-0x08 32bit argument #8 32bit argument #7
+ A0StP-0x10 32bit argument #10 32bit argument #9
+
+Function prologues tend to look a bit like this:
+
+ /* If frame pointer in use, move it to frame temp register so it can be
+ easily pushed onto stack */
+ MOV D0FrT,A0FrP
+
+ /* If frame pointer in use, set it to stack pointer */
+ ADD A0FrP,A0StP,#0
+
+ /* Preserve D0FrT, D1RtP, D{0-1}.{5-7} on stack, incrementing A0StP */
+ MSETL [A0StP++],D0FrT,D0.5,D0.6,D0.7
+
+ /* Allocate some stack space for local variables */
+ ADD A0StP,A0StP,#0x10
+
+At this point the stack would look like this:
+
+ ADDRESS 0 1 2 3 4 5 6 7
+ =============== ===== ===== ===== ===== ===== ===== ===== =====
+ A0StP -->
+ A0StP-0x08
+ A0StP-0x10
+ A0StP-0x18 Old D0.7 Old D1.7
+ A0StP-0x20 Old D0.6 Old D1.6
+ A0StP-0x28 Old D0.5 Old D1.5
+ A0FrP --> Old A0FrP (frame ptr) Old D1RtP (return ptr)
+ A0FrP-0x08 32bit argument #8 32bit argument #7
+ A0FrP-0x10 32bit argument #10 32bit argument #9
+
+Function epilogues tend to differ depending on the use of a frame pointer. An
+example of a frame pointer epilogue:
+
+ /* Restore D0FrT, D1RtP, D{0-1}.{5-7} from stack, incrementing A0FrP */
+ MGETL D0FrT,D0.5,D0.6,D0.7,[A0FrP++]
+ /* Restore stack pointer to where frame pointer was before increment */
+ SUB A0StP,A0FrP,#0x20
+ /* Restore frame pointer from frame temp */
+ MOV A0FrP,D0FrT
+ /* Return to caller via restored return pointer */
+ MOV PC,D1RtP
+
+If the function hasn't touched the frame pointer, MGETL cannot be safely used
+with A0StP as it always increments and that would expose the stack to clobbering
+by interrupts (kernel) or signals (user). Therefore it's common to see the MGETL
+split into separate GETL instructions:
+
+ /* Restore D0FrT, D1RtP, D{0-1}.{5-7} from stack */
+ GETL D0FrT,D1RtP,[A0StP+#-0x30]
+ GETL D0.5,D1.5,[A0StP+#-0x28]
+ GETL D0.6,D1.6,[A0StP+#-0x20]
+ GETL D0.7,D1.7,[A0StP+#-0x18]
+ /* Restore stack pointer */
+ SUB A0StP,A0StP,#0x30
+ /* Return to caller via restored return pointer */
+ MOV PC,D1RtP
diff --git a/Documentation/misc-devices/mei/mei-client-bus.txt b/Documentation/misc-devices/mei/mei-client-bus.txt
new file mode 100644
index 000000000000..f83910a8ce76
--- /dev/null
+++ b/Documentation/misc-devices/mei/mei-client-bus.txt
@@ -0,0 +1,138 @@
+Intel(R) Management Engine (ME) Client bus API
+===============================================
+
+
+Rationale
+=========
+MEI misc character device is useful for dedicated applications to send and receive
+data to the many FW appliance found in Intel's ME from the user space.
+However for some of the ME functionalities it make sense to leverage existing software
+stack and expose them through existing kernel subsystems.
+
+In order to plug seamlessly into the kernel device driver model we add kernel virtual
+bus abstraction on top of the MEI driver. This allows implementing linux kernel drivers
+for the various MEI features as a stand alone entities found in their respective subsystem.
+Existing device drivers can even potentially be re-used by adding an MEI CL bus layer to
+the existing code.
+
+
+MEI CL bus API
+===========
+A driver implementation for an MEI Client is very similar to existing bus
+based device drivers. The driver registers itself as an MEI CL bus driver through
+the mei_cl_driver structure:
+
+struct mei_cl_driver {
+ struct device_driver driver;
+ const char *name;
+
+ const struct mei_cl_device_id *id_table;
+
+ int (*probe)(struct mei_cl_device *dev, const struct mei_cl_id *id);
+ int (*remove)(struct mei_cl_device *dev);
+};
+
+struct mei_cl_id {
+ char name[MEI_NAME_SIZE];
+ kernel_ulong_t driver_info;
+};
+
+The mei_cl_id structure allows the driver to bind itself against a device name.
+
+To actually register a driver on the ME Client bus one must call the mei_cl_add_driver()
+API. This is typically called at module init time.
+
+Once registered on the ME Client bus, a driver will typically try to do some I/O on
+this bus and this should be done through the mei_cl_send() and mei_cl_recv()
+routines. The latter is synchronous (blocks and sleeps until data shows up).
+In order for drivers to be notified of pending events waiting for them (e.g.
+an Rx event) they can register an event handler through the
+mei_cl_register_event_cb() routine. Currently only the MEI_EVENT_RX event
+will trigger an event handler call and the driver implementation is supposed
+to call mei_recv() from the event handler in order to fetch the pending
+received buffers.
+
+
+Example
+=======
+As a theoretical example let's pretend the ME comes with a "contact" NFC IP.
+The driver init and exit routines for this device would look like:
+
+#define CONTACT_DRIVER_NAME "contact"
+
+static struct mei_cl_device_id contact_mei_cl_tbl[] = {
+ { CONTACT_DRIVER_NAME, },
+
+ /* required last entry */
+ { }
+};
+MODULE_DEVICE_TABLE(mei_cl, contact_mei_cl_tbl);
+
+static struct mei_cl_driver contact_driver = {
+ .id_table = contact_mei_tbl,
+ .name = CONTACT_DRIVER_NAME,
+
+ .probe = contact_probe,
+ .remove = contact_remove,
+};
+
+static int contact_init(void)
+{
+ int r;
+
+ r = mei_cl_driver_register(&contact_driver);
+ if (r) {
+ pr_err(CONTACT_DRIVER_NAME ": driver registration failed\n");
+ return r;
+ }
+
+ return 0;
+}
+
+static void __exit contact_exit(void)
+{
+ mei_cl_driver_unregister(&contact_driver);
+}
+
+module_init(contact_init);
+module_exit(contact_exit);
+
+And the driver's simplified probe routine would look like that:
+
+int contact_probe(struct mei_cl_device *dev, struct mei_cl_device_id *id)
+{
+ struct contact_driver *contact;
+
+ [...]
+ mei_cl_enable_device(dev);
+
+ mei_cl_register_event_cb(dev, contact_event_cb, contact);
+
+ return 0;
+ }
+
+In the probe routine the driver first enable the MEI device and then registers
+an ME bus event handler which is as close as it can get to registering a
+threaded IRQ handler.
+The handler implementation will typically call some I/O routine depending on
+the pending events:
+
+#define MAX_NFC_PAYLOAD 128
+
+static void contact_event_cb(struct mei_cl_device *dev, u32 events,
+ void *context)
+{
+ struct contact_driver *contact = context;
+
+ if (events & BIT(MEI_EVENT_RX)) {
+ u8 payload[MAX_NFC_PAYLOAD];
+ int payload_size;
+
+ payload_size = mei_recv(dev, payload, MAX_NFC_PAYLOAD);
+ if (payload_size <= 0)
+ return;
+
+ /* Hook to the NFC subsystem */
+ nfc_hci_recv_frame(contact->hdev, payload, payload_size);
+ }
+}
diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt
index f2a2488f1bf3..9573d0c48c6e 100644
--- a/Documentation/networking/ipvs-sysctl.txt
+++ b/Documentation/networking/ipvs-sysctl.txt
@@ -15,6 +15,13 @@ amemthresh - INTEGER
enabled and the variable is automatically set to 2, otherwise
the strategy is disabled and the variable is set to 1.
+backup_only - BOOLEAN
+ 0 - disabled (default)
+ not 0 - enabled
+
+ If set, disable the director function while the server is
+ in backup mode to avoid packet loops for DR/TUN methods.
+
conntrack - BOOLEAN
0 - disabled (default)
not 0 - enabled
diff --git a/Documentation/networking/tuntap.txt b/Documentation/networking/tuntap.txt
index c0aab985bad9..949d5dcdd9a3 100644
--- a/Documentation/networking/tuntap.txt
+++ b/Documentation/networking/tuntap.txt
@@ -105,6 +105,83 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
Proto [2 bytes]
Raw protocol(IP, IPv6, etc) frame.
+ 3.3 Multiqueue tuntap interface:
+
+ From version 3.8, Linux supports multiqueue tuntap which can uses multiple
+ file descriptors (queues) to parallelize packets sending or receiving. The
+ device allocation is the same as before, and if user wants to create multiple
+ queues, TUNSETIFF with the same device name must be called many times with
+ IFF_MULTI_QUEUE flag.
+
+ char *dev should be the name of the device, queues is the number of queues to
+ be created, fds is used to store and return the file descriptors (queues)
+ created to the caller. Each file descriptor were served as the interface of a
+ queue which could be accessed by userspace.
+
+ #include <linux/if.h>
+ #include <linux/if_tun.h>
+
+ int tun_alloc_mq(char *dev, int queues, int *fds)
+ {
+ struct ifreq ifr;
+ int fd, err, i;
+
+ if (!dev)
+ return -1;
+
+ memset(&ifr, 0, sizeof(ifr));
+ /* Flags: IFF_TUN - TUN device (no Ethernet headers)
+ * IFF_TAP - TAP device
+ *
+ * IFF_NO_PI - Do not provide packet information
+ * IFF_MULTI_QUEUE - Create a queue of multiqueue device
+ */
+ ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_MULTI_QUEUE;
+ strcpy(ifr.ifr_name, dev);
+
+ for (i = 0; i < queues; i++) {
+ if ((fd = open("/dev/net/tun", O_RDWR)) < 0)
+ goto err;
+ err = ioctl(fd, TUNSETIFF, (void *)&ifr);
+ if (err) {
+ close(fd);
+ goto err;
+ }
+ fds[i] = fd;
+ }
+
+ return 0;
+ err:
+ for (--i; i >= 0; i--)
+ close(fds[i]);
+ return err;
+ }
+
+ A new ioctl(TUNSETQUEUE) were introduced to enable or disable a queue. When
+ calling it with IFF_DETACH_QUEUE flag, the queue were disabled. And when
+ calling it with IFF_ATTACH_QUEUE flag, the queue were enabled. The queue were
+ enabled by default after it was created through TUNSETIFF.
+
+ fd is the file descriptor (queue) that we want to enable or disable, when
+ enable is true we enable it, otherwise we disable it
+
+ #include <linux/if.h>
+ #include <linux/if_tun.h>
+
+ int tun_set_queue(int fd, int enable)
+ {
+ struct ifreq ifr;
+
+ memset(&ifr, 0, sizeof(ifr));
+
+ if (enable)
+ ifr.ifr_flags = IFF_ATTACH_QUEUE;
+ else
+ ifr.ifr_flags = IFF_DETACH_QUEUE;
+
+ return ioctl(fd, TUNSETQUEUE, (void *)&ifr);
+ }
+
Universal TUN/TAP device driver Frequently Asked Question.
1. What platforms are supported by TUN/TAP driver ?
diff --git a/Documentation/pinctrl.txt b/Documentation/pinctrl.txt
index a2b57e0a1db0..447fd4cd54ec 100644
--- a/Documentation/pinctrl.txt
+++ b/Documentation/pinctrl.txt
@@ -736,6 +736,13 @@ All the above functions are mandatory to implement for a pinmux driver.
Pin control interaction with the GPIO subsystem
===============================================
+Note that the following implies that the use case is to use a certain pin
+from the Linux kernel using the API in <linux/gpio.h> with gpio_request()
+and similar functions. There are cases where you may be using something
+that your datasheet calls "GPIO mode" but actually is just an electrical
+configuration for a certain device. See the section below named
+"GPIO mode pitfalls" for more details on this scenario.
+
The public pinmux API contains two functions named pinctrl_request_gpio()
and pinctrl_free_gpio(). These two functions shall *ONLY* be called from
gpiolib-based drivers as part of their gpio_request() and
@@ -774,6 +781,111 @@ obtain the function "gpioN" where "N" is the global GPIO pin number if no
special GPIO-handler is registered.
+GPIO mode pitfalls
+==================
+
+Sometime the developer may be confused by a datasheet talking about a pin
+being possible to set into "GPIO mode". It appears that what hardware
+engineers mean with "GPIO mode" is not necessarily the use case that is
+implied in the kernel interface <linux/gpio.h>: a pin that you grab from
+kernel code and then either listen for input or drive high/low to
+assert/deassert some external line.
+
+Rather hardware engineers think that "GPIO mode" means that you can
+software-control a few electrical properties of the pin that you would
+not be able to control if the pin was in some other mode, such as muxed in
+for a device.
+
+Example: a pin is usually muxed in to be used as a UART TX line. But during
+system sleep, we need to put this pin into "GPIO mode" and ground it.
+
+If you make a 1-to-1 map to the GPIO subsystem for this pin, you may start
+to think that you need to come up with something real complex, that the
+pin shall be used for UART TX and GPIO at the same time, that you will grab
+a pin control handle and set it to a certain state to enable UART TX to be
+muxed in, then twist it over to GPIO mode and use gpio_direction_output()
+to drive it low during sleep, then mux it over to UART TX again when you
+wake up and maybe even gpio_request/gpio_free as part of this cycle. This
+all gets very complicated.
+
+The solution is to not think that what the datasheet calls "GPIO mode"
+has to be handled by the <linux/gpio.h> interface. Instead view this as
+a certain pin config setting. Look in e.g. <linux/pinctrl/pinconf-generic.h>
+and you find this in the documentation:
+
+ PIN_CONFIG_OUTPUT: this will configure the pin in output, use argument
+ 1 to indicate high level, argument 0 to indicate low level.
+
+So it is perfectly possible to push a pin into "GPIO mode" and drive the
+line low as part of the usual pin control map. So for example your UART
+driver may look like this:
+
+#include <linux/pinctrl/consumer.h>
+
+struct pinctrl *pinctrl;
+struct pinctrl_state *pins_default;
+struct pinctrl_state *pins_sleep;
+
+pins_default = pinctrl_lookup_state(uap->pinctrl, PINCTRL_STATE_DEFAULT);
+pins_sleep = pinctrl_lookup_state(uap->pinctrl, PINCTRL_STATE_SLEEP);
+
+/* Normal mode */
+retval = pinctrl_select_state(pinctrl, pins_default);
+/* Sleep mode */
+retval = pinctrl_select_state(pinctrl, pins_sleep);
+
+And your machine configuration may look like this:
+--------------------------------------------------
+
+static unsigned long uart_default_mode[] = {
+ PIN_CONF_PACKED(PIN_CONFIG_DRIVE_PUSH_PULL, 0),
+};
+
+static unsigned long uart_sleep_mode[] = {
+ PIN_CONF_PACKED(PIN_CONFIG_OUTPUT, 0),
+};
+
+static struct pinctrl_map __initdata pinmap[] = {
+ PIN_MAP_MUX_GROUP("uart", PINCTRL_STATE_DEFAULT, "pinctrl-foo",
+ "u0_group", "u0"),
+ PIN_MAP_CONFIGS_PIN("uart", PINCTRL_STATE_DEFAULT, "pinctrl-foo",
+ "UART_TX_PIN", uart_default_mode),
+ PIN_MAP_MUX_GROUP("uart", PINCTRL_STATE_SLEEP, "pinctrl-foo",
+ "u0_group", "gpio-mode"),
+ PIN_MAP_CONFIGS_PIN("uart", PINCTRL_STATE_SLEEP, "pinctrl-foo",
+ "UART_TX_PIN", uart_sleep_mode),
+};
+
+foo_init(void) {
+ pinctrl_register_mappings(pinmap, ARRAY_SIZE(pinmap));
+}
+
+Here the pins we want to control are in the "u0_group" and there is some
+function called "u0" that can be enabled on this group of pins, and then
+everything is UART business as usual. But there is also some function
+named "gpio-mode" that can be mapped onto the same pins to move them into
+GPIO mode.
+
+This will give the desired effect without any bogus interaction with the
+GPIO subsystem. It is just an electrical configuration used by that device
+when going to sleep, it might imply that the pin is set into something the
+datasheet calls "GPIO mode" but that is not the point: it is still used
+by that UART device to control the pins that pertain to that very UART
+driver, putting them into modes needed by the UART. GPIO in the Linux
+kernel sense are just some 1-bit line, and is a different use case.
+
+How the registers are poked to attain the push/pull and output low
+configuration and the muxing of the "u0" or "gpio-mode" group onto these
+pins is a question for the driver.
+
+Some datasheets will be more helpful and refer to the "GPIO mode" as
+"low power mode" rather than anything to do with GPIO. This often means
+the same thing electrically speaking, but in this latter case the
+software engineers will usually quickly identify that this is some
+specific muxing/configuration rather than anything related to the GPIO
+API.
+
+
Board/machine configuration
==================================
diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt
index 3035d00757ad..425c51d56aef 100644
--- a/Documentation/power/opp.txt
+++ b/Documentation/power/opp.txt
@@ -1,6 +1,5 @@
-*=============*
-* OPP Library *
-*=============*
+Operating Performance Points (OPP) Library
+==========================================
(C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated
@@ -16,15 +15,31 @@ Contents
1. Introduction
===============
+1.1 What is an Operating Performance Point (OPP)?
+
Complex SoCs of today consists of a multiple sub-modules working in conjunction.
In an operational system executing varied use cases, not all modules in the SoC
need to function at their highest performing frequency all the time. To
facilitate this, sub-modules in a SoC are grouped into domains, allowing some
-domains to run at lower voltage and frequency while other domains are loaded
-more. The set of discrete tuples consisting of frequency and voltage pairs that
+domains to run at lower voltage and frequency while other domains run at
+voltage/frequency pairs that are higher.
+
+The set of discrete tuples consisting of frequency and voltage pairs that
the device will support per domain are called Operating Performance Points or
OPPs.
+As an example:
+Let us consider an MPU device which supports the following:
+{300MHz at minimum voltage of 1V}, {800MHz at minimum voltage of 1.2V},
+{1GHz at minimum voltage of 1.3V}
+
+We can represent these as three OPPs as the following {Hz, uV} tuples:
+{300000000, 1000000}
+{800000000, 1200000}
+{1000000000, 1300000}
+
+1.2 Operating Performance Points Library
+
OPP library provides a set of helper functions to organize and query the OPP
information. The library is located in drivers/base/power/opp.c and the header
is located in include/linux/opp.h. OPP library can be enabled by enabling
diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index e8a6aa473bab..6e953564de03 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -170,5 +170,5 @@ Reminder: sizeof() result is of type size_t.
Thank you for your cooperation and attention.
-By Randy Dunlap <rdunlap@xenotime.net> and
+By Randy Dunlap <rdunlap@infradead.org> and
Andrew Murray <amurray@mpc-data.co.uk>
diff --git a/Documentation/s390/s390dbf.txt b/Documentation/s390/s390dbf.txt
index ae66f9b90a25..fcaf0b4efba2 100644
--- a/Documentation/s390/s390dbf.txt
+++ b/Documentation/s390/s390dbf.txt
@@ -143,7 +143,8 @@ Parameter: id: handle for debug log
Return Value: none
-Description: frees memory for a debug log
+Description: frees memory for a debug log and removes all registered debug
+ views.
Must not be called within an interrupt handler
---------------------------------------------------------------------------
diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas
index da03146c182a..09673c7fc8ee 100644
--- a/Documentation/scsi/ChangeLog.megaraid_sas
+++ b/Documentation/scsi/ChangeLog.megaraid_sas
@@ -1,3 +1,12 @@
+Release Date : Sat. Feb 9, 2013 17:00:00 PST 2013 -
+ (emaild-id:megaraidlinux@lsi.com)
+ Adam Radford
+Current Version : 06.506.00.00-rc1
+Old Version : 06.504.01.00-rc1
+ 1. Add 4k FastPath DIF support.
+ 2. Dont load DevHandle unless FastPath enabled.
+ 3. Version and Changelog update.
+-------------------------------------------------------------------------------
Release Date : Mon. Oct 1, 2012 17:00:00 PST 2012 -
(emaild-id:megaraidlinux@lsi.com)
Adam Radford
diff --git a/Documentation/scsi/LICENSE.qla2xxx b/Documentation/scsi/LICENSE.qla2xxx
index 27a91cf43d6d..5020b7b5a244 100644
--- a/Documentation/scsi/LICENSE.qla2xxx
+++ b/Documentation/scsi/LICENSE.qla2xxx
@@ -1,4 +1,4 @@
-Copyright (c) 2003-2012 QLogic Corporation
+Copyright (c) 2003-2013 QLogic Corporation
QLogic Linux FC-FCoE Driver
This program includes a device driver for Linux 3.x.
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index ce6581c8ca26..95731a08f257 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -890,9 +890,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
enable_msi - Enable Message Signaled Interrupt (MSI) (default = off)
power_save - Automatic power-saving timeout (in second, 0 =
disable)
- power_save_controller - Support runtime D3 of HD-audio controller
- (-1 = on for supported chip (default), false = off,
- true = force to on even for unsupported hardware)
+ power_save_controller - Reset HD-audio controller in power-saving mode
+ (default = on)
align_buffer_size - Force rounding of buffer/period sizes to multiples
of 128 bytes. This is more efficient in terms of memory
access but isn't required by the HDA spec and prevents
@@ -912,7 +911,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
models depending on the codec chip. The list of available models
is found in HD-Audio-Models.txt
- The model name "genric" is treated as a special case. When this
+ The model name "generic" is treated as a special case. When this
model is given, the driver uses the generic codec parser without
"codec-patch". It's sometimes good for testing and debugging.
diff --git a/Documentation/sound/alsa/seq_oss.html b/Documentation/sound/alsa/seq_oss.html
index d9776cf60c07..9663b45f6fde 100644
--- a/Documentation/sound/alsa/seq_oss.html
+++ b/Documentation/sound/alsa/seq_oss.html
@@ -285,7 +285,7 @@ sample data.
<H4>
7.2.4 Close Callback</H4>
The <TT>close</TT> callback is called when this device is closed by the
-applicaion. If any private data was allocated in open callback, it must
+application. If any private data was allocated in open callback, it must
be released in the close callback. The deletion of ALSA port should be
done here, too. This callback must not be NULL.
<H4>
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 078701fdbd4d..dcc75a9ed919 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -18,6 +18,7 @@ files can be found in mm/swap.c.
Currently, these files are in /proc/sys/vm:
+- admin_reserve_kbytes
- block_dump
- compact_memory
- dirty_background_bytes
@@ -53,11 +54,41 @@ Currently, these files are in /proc/sys/vm:
- percpu_pagelist_fraction
- stat_interval
- swappiness
+- user_reserve_kbytes
- vfs_cache_pressure
- zone_reclaim_mode
==============================================================
+admin_reserve_kbytes
+
+The amount of free memory in the system that should be reserved for users
+with the capability cap_sys_admin.
+
+admin_reserve_kbytes defaults to min(3% of free pages, 8MB)
+
+That should provide enough for the admin to log in and kill a process,
+if necessary, under the default overcommit 'guess' mode.
+
+Systems running under overcommit 'never' should increase this to account
+for the full Virtual Memory Size of programs used to recover. Otherwise,
+root may not be able to log in to recover the system.
+
+How do you calculate a minimum useful reserve?
+
+sshd or login + bash (or some other shell) + top (or ps, kill, etc.)
+
+For overcommit 'guess', we can sum resident set sizes (RSS).
+On x86_64 this is about 8MB.
+
+For overcommit 'never', we can take the max of their virtual sizes (VSZ)
+and add the sum of their RSS.
+On x86_64 this is about 128MB.
+
+Changing this takes effect whenever an application requests memory.
+
+==============================================================
+
block_dump
block_dump enables block I/O debugging when set to a nonzero value. More
@@ -542,6 +573,7 @@ memory until it actually runs out.
When this flag is 2, the kernel uses a "never overcommit"
policy that attempts to prevent any overcommit of memory.
+Note that user_reserve_kbytes affects this policy.
This feature can be very useful because there are a lot of
programs that malloc() huge amounts of memory "just-in-case"
@@ -645,6 +677,24 @@ The default value is 60.
==============================================================
+- user_reserve_kbytes
+
+When overcommit_memory is set to 2, "never overommit" mode, reserve
+min(3% of current process size, user_reserve_kbytes) of free memory.
+This is intended to prevent a user from starting a single memory hogging
+process, such that they cannot recover (kill the hog).
+
+user_reserve_kbytes defaults to min(3% of the current process size, 128MB).
+
+If this is reduced to zero, then the user will be allowed to allocate
+all free memory with a single process, minus admin_reserve_kbytes.
+Any subsequent attempts to execute a command will result in
+"fork: Cannot allocate memory".
+
+Changing this takes effect whenever an application requests memory.
+
+==============================================================
+
vfs_cache_pressure
------------------
diff --git a/Documentation/thermal/exynos_thermal_emulation b/Documentation/thermal/exynos_thermal_emulation
new file mode 100644
index 000000000000..b73bbfb697bb
--- /dev/null
+++ b/Documentation/thermal/exynos_thermal_emulation
@@ -0,0 +1,53 @@
+EXYNOS EMULATION MODE
+========================
+
+Copyright (C) 2012 Samsung Electronics
+
+Written by Jonghwa Lee <jonghwa3.lee@samsung.com>
+
+Description
+-----------
+
+Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal management unit.
+Thermal emulation mode supports software debug for TMU's operation. User can set temperature
+manually with software code and TMU will read current temperature from user value not from
+sensor's value.
+
+Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this support in available.
+When it's enabled, sysfs node will be created under
+/sys/bus/platform/devices/'exynos device name'/ with name of 'emulation'.
+
+The sysfs node, 'emulation', will contain value 0 for the initial state. When you input any
+temperature you want to update to sysfs node, it automatically enable emulation mode and
+current temperature will be changed into it.
+(Exynos also supports user changable delay time which would be used to delay of
+ changing temperature. However, this node only uses same delay of real sensing time, 938us.)
+
+Exynos emulation mode requires synchronous of value changing and enabling. It means when you
+want to update the any value of delay or next temperature, then you have to enable emulation
+mode at the same time. (Or you have to keep the mode enabling.) If you don't, it fails to
+change the value to updated one and just use last succeessful value repeatedly. That's why
+this node gives users the right to change termerpature only. Just one interface makes it more
+simply to use.
+
+Disabling emulation mode only requires writing value 0 to sysfs node.
+
+
+TEMP 120 |
+ |
+ 100 |
+ |
+ 80 |
+ | +-----------
+ 60 | | |
+ | +-------------| |
+ 40 | | | |
+ | | | |
+ 20 | | | +----------
+ | | | | |
+ 0 |______________|_____________|__________|__________|_________
+ A A A A TIME
+ |<----->| |<----->| |<----->| |
+ | 938us | | | | | |
+emulation : 0 50 | 70 | 20 | 0
+current temp : sensor 50 70 20 sensor
diff --git a/Documentation/thermal/intel_powerclamp.txt b/Documentation/thermal/intel_powerclamp.txt
new file mode 100644
index 000000000000..332de4a39b5a
--- /dev/null
+++ b/Documentation/thermal/intel_powerclamp.txt
@@ -0,0 +1,307 @@
+ =======================
+ INTEL POWERCLAMP DRIVER
+ =======================
+By: Arjan van de Ven <arjan@linux.intel.com>
+ Jacob Pan <jacob.jun.pan@linux.intel.com>
+
+Contents:
+ (*) Introduction
+ - Goals and Objectives
+
+ (*) Theory of Operation
+ - Idle Injection
+ - Calibration
+
+ (*) Performance Analysis
+ - Effectiveness and Limitations
+ - Power vs Performance
+ - Scalability
+ - Calibration
+ - Comparison with Alternative Techniques
+
+ (*) Usage and Interfaces
+ - Generic Thermal Layer (sysfs)
+ - Kernel APIs (TBD)
+
+============
+INTRODUCTION
+============
+
+Consider the situation where a system’s power consumption must be
+reduced at runtime, due to power budget, thermal constraint, or noise
+level, and where active cooling is not preferred. Software managed
+passive power reduction must be performed to prevent the hardware
+actions that are designed for catastrophic scenarios.
+
+Currently, P-states, T-states (clock modulation), and CPU offlining
+are used for CPU throttling.
+
+On Intel CPUs, C-states provide effective power reduction, but so far
+they’re only used opportunistically, based on workload. With the
+development of intel_powerclamp driver, the method of synchronizing
+idle injection across all online CPU threads was introduced. The goal
+is to achieve forced and controllable C-state residency.
+
+Test/Analysis has been made in the areas of power, performance,
+scalability, and user experience. In many cases, clear advantage is
+shown over taking the CPU offline or modulating the CPU clock.
+
+
+===================
+THEORY OF OPERATION
+===================
+
+Idle Injection
+--------------
+
+On modern Intel processors (Nehalem or later), package level C-state
+residency is available in MSRs, thus also available to the kernel.
+
+These MSRs are:
+ #define MSR_PKG_C2_RESIDENCY 0x60D
+ #define MSR_PKG_C3_RESIDENCY 0x3F8
+ #define MSR_PKG_C6_RESIDENCY 0x3F9
+ #define MSR_PKG_C7_RESIDENCY 0x3FA
+
+If the kernel can also inject idle time to the system, then a
+closed-loop control system can be established that manages package
+level C-state. The intel_powerclamp driver is conceived as such a
+control system, where the target set point is a user-selected idle
+ratio (based on power reduction), and the error is the difference
+between the actual package level C-state residency ratio and the target idle
+ratio.
+
+Injection is controlled by high priority kernel threads, spawned for
+each online CPU.
+
+These kernel threads, with SCHED_FIFO class, are created to perform
+clamping actions of controlled duty ratio and duration. Each per-CPU
+thread synchronizes its idle time and duration, based on the rounding
+of jiffies, so accumulated errors can be prevented to avoid a jittery
+effect. Threads are also bound to the CPU such that they cannot be
+migrated, unless the CPU is taken offline. In this case, threads
+belong to the offlined CPUs will be terminated immediately.
+
+Running as SCHED_FIFO and relatively high priority, also allows such
+scheme to work for both preemptable and non-preemptable kernels.
+Alignment of idle time around jiffies ensures scalability for HZ
+values. This effect can be better visualized using a Perf timechart.
+The following diagram shows the behavior of kernel thread
+kidle_inject/cpu. During idle injection, it runs monitor/mwait idle
+for a given "duration", then relinquishes the CPU to other tasks,
+until the next time interval.
+
+The NOHZ schedule tick is disabled during idle time, but interrupts
+are not masked. Tests show that the extra wakeups from scheduler tick
+have a dramatic impact on the effectiveness of the powerclamp driver
+on large scale systems (Westmere system with 80 processors).
+
+CPU0
+ ____________ ____________
+kidle_inject/0 | sleep | mwait | sleep |
+ _________| |________| |_______
+ duration
+CPU1
+ ____________ ____________
+kidle_inject/1 | sleep | mwait | sleep |
+ _________| |________| |_______
+ ^
+ |
+ |
+ roundup(jiffies, interval)
+
+Only one CPU is allowed to collect statistics and update global
+control parameters. This CPU is referred to as the controlling CPU in
+this document. The controlling CPU is elected at runtime, with a
+policy that favors BSP, taking into account the possibility of a CPU
+hot-plug.
+
+In terms of dynamics of the idle control system, package level idle
+time is considered largely as a non-causal system where its behavior
+cannot be based on the past or current input. Therefore, the
+intel_powerclamp driver attempts to enforce the desired idle time
+instantly as given input (target idle ratio). After injection,
+powerclamp moniors the actual idle for a given time window and adjust
+the next injection accordingly to avoid over/under correction.
+
+When used in a causal control system, such as a temperature control,
+it is up to the user of this driver to implement algorithms where
+past samples and outputs are included in the feedback. For example, a
+PID-based thermal controller can use the powerclamp driver to
+maintain a desired target temperature, based on integral and
+derivative gains of the past samples.
+
+
+
+Calibration
+-----------
+During scalability testing, it is observed that synchronized actions
+among CPUs become challenging as the number of cores grows. This is
+also true for the ability of a system to enter package level C-states.
+
+To make sure the intel_powerclamp driver scales well, online
+calibration is implemented. The goals for doing such a calibration
+are:
+
+a) determine the effective range of idle injection ratio
+b) determine the amount of compensation needed at each target ratio
+
+Compensation to each target ratio consists of two parts:
+
+ a) steady state error compensation
+ This is to offset the error occurring when the system can
+ enter idle without extra wakeups (such as external interrupts).
+
+ b) dynamic error compensation
+ When an excessive amount of wakeups occurs during idle, an
+ additional idle ratio can be added to quiet interrupts, by
+ slowing down CPU activities.
+
+A debugfs file is provided for the user to examine compensation
+progress and results, such as on a Westmere system.
+[jacob@nex01 ~]$ cat
+/sys/kernel/debug/intel_powerclamp/powerclamp_calib
+controlling cpu: 0
+pct confidence steady dynamic (compensation)
+0 0 0 0
+1 1 0 0
+2 1 1 0
+3 3 1 0
+4 3 1 0
+5 3 1 0
+6 3 1 0
+7 3 1 0
+8 3 1 0
+...
+30 3 2 0
+31 3 2 0
+32 3 1 0
+33 3 2 0
+34 3 1 0
+35 3 2 0
+36 3 1 0
+37 3 2 0
+38 3 1 0
+39 3 2 0
+40 3 3 0
+41 3 1 0
+42 3 2 0
+43 3 1 0
+44 3 1 0
+45 3 2 0
+46 3 3 0
+47 3 0 0
+48 3 2 0
+49 3 3 0
+
+Calibration occurs during runtime. No offline method is available.
+Steady state compensation is used only when confidence levels of all
+adjacent ratios have reached satisfactory level. A confidence level
+is accumulated based on clean data collected at runtime. Data
+collected during a period without extra interrupts is considered
+clean.
+
+To compensate for excessive amounts of wakeup during idle, additional
+idle time is injected when such a condition is detected. Currently,
+we have a simple algorithm to double the injection ratio. A possible
+enhancement might be to throttle the offending IRQ, such as delaying
+EOI for level triggered interrupts. But it is a challenge to be
+non-intrusive to the scheduler or the IRQ core code.
+
+
+CPU Online/Offline
+------------------
+Per-CPU kernel threads are started/stopped upon receiving
+notifications of CPU hotplug activities. The intel_powerclamp driver
+keeps track of clamping kernel threads, even after they are migrated
+to other CPUs, after a CPU offline event.
+
+
+=====================
+Performance Analysis
+=====================
+This section describes the general performance data collected on
+multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P).
+
+Effectiveness and Limitations
+-----------------------------
+The maximum range that idle injection is allowed is capped at 50
+percent. As mentioned earlier, since interrupts are allowed during
+forced idle time, excessive interrupts could result in less
+effectiveness. The extreme case would be doing a ping -f to generated
+flooded network interrupts without much CPU acknowledgement. In this
+case, little can be done from the idle injection threads. In most
+normal cases, such as scp a large file, applications can be throttled
+by the powerclamp driver, since slowing down the CPU also slows down
+network protocol processing, which in turn reduces interrupts.
+
+When control parameters change at runtime by the controlling CPU, it
+may take an additional period for the rest of the CPUs to catch up
+with the changes. During this time, idle injection is out of sync,
+thus not able to enter package C- states at the expected ratio. But
+this effect is minor, in that in most cases change to the target
+ratio is updated much less frequently than the idle injection
+frequency.
+
+Scalability
+-----------
+Tests also show a minor, but measurable, difference between the 4P/8P
+Ivy Bridge system and the 80P Westmere server under 50% idle ratio.
+More compensation is needed on Westmere for the same amount of
+target idle ratio. The compensation also increases as the idle ratio
+gets larger. The above reason constitutes the need for the
+calibration code.
+
+On the IVB 8P system, compared to an offline CPU, powerclamp can
+achieve up to 40% better performance per watt. (measured by a spin
+counter summed over per CPU counting threads spawned for all running
+CPUs).
+
+====================
+Usage and Interfaces
+====================
+The powerclamp driver is registered to the generic thermal layer as a
+cooling device. Currently, it’s not bound to any thermal zones.
+
+jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . *
+cur_state:0
+max_state:50
+type:intel_powerclamp
+
+Example usage:
+- To inject 25% idle time
+$ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state
+"
+
+If the system is not busy and has more than 25% idle time already,
+then the powerclamp driver will not start idle injection. Using Top
+will not show idle injection kernel threads.
+
+If the system is busy (spin test below) and has less than 25% natural
+idle time, powerclamp kernel threads will do idle injection, which
+appear running to the scheduler. But the overall system idle is still
+reflected. In this example, 24.1% idle is shown. This helps the
+system admin or user determine the cause of slowdown, when a
+powerclamp driver is in action.
+
+
+Tasks: 197 total, 1 running, 196 sleeping, 0 stopped, 0 zombie
+Cpu(s): 71.2%us, 4.7%sy, 0.0%ni, 24.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
+Mem: 3943228k total, 1689632k used, 2253596k free, 74960k buffers
+Swap: 4087804k total, 0k used, 4087804k free, 945336k cached
+
+ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
+ 3352 jacob 20 0 262m 644 428 S 286 0.0 0:17.16 spin
+ 3341 root -51 0 0 0 0 D 25 0.0 0:01.62 kidle_inject/0
+ 3344 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/3
+ 3342 root -51 0 0 0 0 D 25 0.0 0:01.61 kidle_inject/1
+ 3343 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/2
+ 2935 jacob 20 0 696m 125m 35m S 5 3.3 0:31.11 firefox
+ 1546 root 20 0 158m 20m 6640 S 3 0.5 0:26.97 Xorg
+ 2100 jacob 20 0 1223m 88m 30m S 3 2.3 0:23.68 compiz
+
+Tests have shown that by using the powerclamp driver as a cooling
+device, a PID based userspace thermal controller can manage to
+control CPU temperature effectively, when no other thermal influence
+is added. For example, a UltraBook user can compile the kernel under
+certain temperature (below most active trip points).
diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt
index 88c02334e356..6859661c9d31 100644
--- a/Documentation/thermal/sysfs-api.txt
+++ b/Documentation/thermal/sysfs-api.txt
@@ -55,6 +55,8 @@ temperature) and throttle appropriate devices.
.get_trip_type: get the type of certain trip point.
.get_trip_temp: get the temperature above which the certain trip point
will be fired.
+ .set_emul_temp: set the emulation temperature which helps in debugging
+ different threshold temperature points.
1.1.2 void thermal_zone_device_unregister(struct thermal_zone_device *tz)
@@ -153,6 +155,7 @@ Thermal zone device sys I/F, created once it's registered:
|---trip_point_[0-*]_temp: Trip point temperature
|---trip_point_[0-*]_type: Trip point type
|---trip_point_[0-*]_hyst: Hysteresis value for this trip point
+ |---emul_temp: Emulated temperature set node
Thermal cooling device sys I/F, created once it's registered:
/sys/class/thermal/cooling_device[0-*]:
@@ -252,6 +255,16 @@ passive
Valid values: 0 (disabled) or greater than 1000
RW, Optional
+emul_temp
+ Interface to set the emulated temperature method in thermal zone
+ (sensor). After setting this temperature, the thermal zone may pass
+ this temperature to platform emulation function if registered or
+ cache it locally. This is useful in debugging different temperature
+ threshold and its associated cooling action. This is write only node
+ and writing 0 on this node should disable emulation.
+ Unit: millidegree Celsius
+ WO, Optional
+
*****************************
* Cooling device attributes *
*****************************
@@ -329,8 +342,9 @@ The framework includes a simple notification mechanism, in the form of a
netlink event. Netlink socket initialization is done during the _init_
of the framework. Drivers which intend to use the notification mechanism
just need to call thermal_generate_netlink_event() with two arguments viz
-(originator, event). Typically the originator will be an integer assigned
-to a thermal_zone_device when it registers itself with the framework. The
+(originator, event). The originator is a pointer to struct thermal_zone_device
+from where the event has been originated. An integer which represents the
+thermal zone device will be used in the message to identify the zone. The
event will be one of:{THERMAL_AUX0, THERMAL_AUX1, THERMAL_CRITICAL,
THERMAL_DEV_FAULT}. Notification can be sent when the current temperature
crosses any of the configured thresholds.
diff --git a/Documentation/this_cpu_ops.txt b/Documentation/this_cpu_ops.txt
new file mode 100644
index 000000000000..1a4ce7e3e05f
--- /dev/null
+++ b/Documentation/this_cpu_ops.txt
@@ -0,0 +1,205 @@
+this_cpu operations
+-------------------
+
+this_cpu operations are a way of optimizing access to per cpu
+variables associated with the *currently* executing processor through
+the use of segment registers (or a dedicated register where the cpu
+permanently stored the beginning of the per cpu area for a specific
+processor).
+
+The this_cpu operations add a per cpu variable offset to the processor
+specific percpu base and encode that operation in the instruction
+operating on the per cpu variable.
+
+This means there are no atomicity issues between the calculation of
+the offset and the operation on the data. Therefore it is not
+necessary to disable preempt or interrupts to ensure that the
+processor is not changed between the calculation of the address and
+the operation on the data.
+
+Read-modify-write operations are of particular interest. Frequently
+processors have special lower latency instructions that can operate
+without the typical synchronization overhead but still provide some
+sort of relaxed atomicity guarantee. The x86 for example can execute
+RMV (Read Modify Write) instructions like inc/dec/cmpxchg without the
+lock prefix and the associated latency penalty.
+
+Access to the variable without the lock prefix is not synchronized but
+synchronization is not necessary since we are dealing with per cpu
+data specific to the currently executing processor. Only the current
+processor should be accessing that variable and therefore there are no
+concurrency issues with other processors in the system.
+
+On x86 the fs: or the gs: segment registers contain the base of the
+per cpu area. It is then possible to simply use the segment override
+to relocate a per cpu relative address to the proper per cpu area for
+the processor. So the relocation to the per cpu base is encoded in the
+instruction via a segment register prefix.
+
+For example:
+
+ DEFINE_PER_CPU(int, x);
+ int z;
+
+ z = this_cpu_read(x);
+
+results in a single instruction
+
+ mov ax, gs:[x]
+
+instead of a sequence of calculation of the address and then a fetch
+from that address which occurs with the percpu operations. Before
+this_cpu_ops such sequence also required preempt disable/enable to
+prevent the kernel from moving the thread to a different processor
+while the calculation is performed.
+
+The main use of the this_cpu operations has been to optimize counter
+operations.
+
+ this_cpu_inc(x)
+
+results in the following single instruction (no lock prefix!)
+
+ inc gs:[x]
+
+instead of the following operations required if there is no segment
+register.
+
+ int *y;
+ int cpu;
+
+ cpu = get_cpu();
+ y = per_cpu_ptr(&x, cpu);
+ (*y)++;
+ put_cpu();
+
+Note that these operations can only be used on percpu data that is
+reserved for a specific processor. Without disabling preemption in the
+surrounding code this_cpu_inc() will only guarantee that one of the
+percpu counters is correctly incremented. However, there is no
+guarantee that the OS will not move the process directly before or
+after the this_cpu instruction is executed. In general this means that
+the value of the individual counters for each processor are
+meaningless. The sum of all the per cpu counters is the only value
+that is of interest.
+
+Per cpu variables are used for performance reasons. Bouncing cache
+lines can be avoided if multiple processors concurrently go through
+the same code paths. Since each processor has its own per cpu
+variables no concurrent cacheline updates take place. The price that
+has to be paid for this optimization is the need to add up the per cpu
+counters when the value of the counter is needed.
+
+
+Special operations:
+-------------------
+
+ y = this_cpu_ptr(&x)
+
+Takes the offset of a per cpu variable (&x !) and returns the address
+of the per cpu variable that belongs to the currently executing
+processor. this_cpu_ptr avoids multiple steps that the common
+get_cpu/put_cpu sequence requires. No processor number is
+available. Instead the offset of the local per cpu area is simply
+added to the percpu offset.
+
+
+
+Per cpu variables and offsets
+-----------------------------
+
+Per cpu variables have *offsets* to the beginning of the percpu
+area. They do not have addresses although they look like that in the
+code. Offsets cannot be directly dereferenced. The offset must be
+added to a base pointer of a percpu area of a processor in order to
+form a valid address.
+
+Therefore the use of x or &x outside of the context of per cpu
+operations is invalid and will generally be treated like a NULL
+pointer dereference.
+
+In the context of per cpu operations
+
+ x is a per cpu variable. Most this_cpu operations take a cpu
+ variable.
+
+ &x is the *offset* a per cpu variable. this_cpu_ptr() takes
+ the offset of a per cpu variable which makes this look a bit
+ strange.
+
+
+
+Operations on a field of a per cpu structure
+--------------------------------------------
+
+Let's say we have a percpu structure
+
+ struct s {
+ int n,m;
+ };
+
+ DEFINE_PER_CPU(struct s, p);
+
+
+Operations on these fields are straightforward
+
+ this_cpu_inc(p.m)
+
+ z = this_cpu_cmpxchg(p.m, 0, 1);
+
+
+If we have an offset to struct s:
+
+ struct s __percpu *ps = &p;
+
+ z = this_cpu_dec(ps->m);
+
+ z = this_cpu_inc_return(ps->n);
+
+
+The calculation of the pointer may require the use of this_cpu_ptr()
+if we do not make use of this_cpu ops later to manipulate fields:
+
+ struct s *pp;
+
+ pp = this_cpu_ptr(&p);
+
+ pp->m--;
+
+ z = pp->n++;
+
+
+Variants of this_cpu ops
+-------------------------
+
+this_cpu ops are interrupt safe. Some architecture do not support
+these per cpu local operations. In that case the operation must be
+replaced by code that disables interrupts, then does the operations
+that are guaranteed to be atomic and then reenable interrupts. Doing
+so is expensive. If there are other reasons why the scheduler cannot
+change the processor we are executing on then there is no reason to
+disable interrupts. For that purpose the __this_cpu operations are
+provided. For example.
+
+ __this_cpu_inc(x);
+
+Will increment x and will not fallback to code that disables
+interrupts on platforms that cannot accomplish atomicity through
+address relocation and a Read-Modify-Write operation in the same
+instruction.
+
+
+
+&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
+--------------------------------------------
+
+The first operation takes the offset and forms an address and then
+adds the offset of the n field.
+
+The second one first adds the two offsets and then does the
+relocation. IMHO the second form looks cleaner and has an easier time
+with (). The second form also is consistent with the way
+this_cpu_read() and friends are used.
+
+
+Christoph Lameter, April 3rd, 2013
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 53d6a3c51d87..bfe8c29b1f1d 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -8,6 +8,7 @@ Copyright 2008 Red Hat Inc.
Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
John Kacur, and David Teigland.
Written for: 2.6.28-rc2
+Updated for: 3.10
Introduction
------------
@@ -17,13 +18,16 @@ designers of systems to find what is going on inside the kernel.
It can be used for debugging or analyzing latencies and
performance issues that take place outside of user-space.
-Although ftrace is the function tracer, it also includes an
-infrastructure that allows for other types of tracing. Some of
-the tracers that are currently in ftrace include a tracer to
-trace context switches, the time it takes for a high priority
-task to run after it was woken up, the time interrupts are
-disabled, and more (ftrace allows for tracer plugins, which
-means that the list of tracers can always grow).
+Although ftrace is typically considered the function tracer, it
+is really a frame work of several assorted tracing utilities.
+There's latency tracing to examine what occurs between interrupts
+disabled and enabled, as well as for preemption and from a time
+a task is woken to the task is actually scheduled in.
+
+One of the most common uses of ftrace is the event tracing.
+Through out the kernel is hundreds of static event points that
+can be enabled via the debugfs file system to see what is
+going on in certain parts of the kernel.
Implementation Details
@@ -61,7 +65,7 @@ the extended "/sys/kernel/debug/tracing" path name.
That's it! (assuming that you have ftrace configured into your kernel)
-After mounting the debugfs, you can see a directory called
+After mounting debugfs, you can see a directory called
"tracing". This directory contains the control and output files
of ftrace. Here is a list of some of the key files:
@@ -84,7 +88,9 @@ of ftrace. Here is a list of some of the key files:
This sets or displays whether writing to the trace
ring buffer is enabled. Echo 0 into this file to disable
- the tracer or 1 to enable it.
+ the tracer or 1 to enable it. Note, this only disables
+ writing to the ring buffer, the tracing overhead may
+ still be occurring.
trace:
@@ -109,7 +115,15 @@ of ftrace. Here is a list of some of the key files:
This file lets the user control the amount of data
that is displayed in one of the above output
- files.
+ files. Options also exist to modify how a tracer
+ or events work (stack traces, timestamps, etc).
+
+ options:
+
+ This is a directory that has a file for every available
+ trace option (also in trace_options). Options may also be set
+ or cleared by writing a "1" or "0" respectively into the
+ corresponding file with the option name.
tracing_max_latency:
@@ -121,10 +135,17 @@ of ftrace. Here is a list of some of the key files:
latency is greater than the value in this
file. (in microseconds)
+ tracing_thresh:
+
+ Some latency tracers will record a trace whenever the
+ latency is greater than the number in this file.
+ Only active when the file contains a number greater than 0.
+ (in microseconds)
+
buffer_size_kb:
This sets or displays the number of kilobytes each CPU
- buffer can hold. The tracer buffers are the same size
+ buffer holds. By default, the trace buffers are the same size
for each CPU. The displayed number is the size of the
CPU buffer and not total size of all buffers. The
trace buffers are allocated in pages (blocks of memory
@@ -133,16 +154,30 @@ of ftrace. Here is a list of some of the key files:
than requested, the rest of the page will be used,
making the actual allocation bigger than requested.
( Note, the size may not be a multiple of the page size
- due to buffer management overhead. )
+ due to buffer management meta-data. )
- This can only be updated when the current_tracer
- is set to "nop".
+ buffer_total_size_kb:
+
+ This displays the total combined size of all the trace buffers.
+
+ free_buffer:
+
+ If a process is performing the tracing, and the ring buffer
+ should be shrunk "freed" when the process is finished, even
+ if it were to be killed by a signal, this file can be used
+ for that purpose. On close of this file, the ring buffer will
+ be resized to its minimum size. Having a process that is tracing
+ also open this file, when the process exits its file descriptor
+ for this file will be closed, and in doing so, the ring buffer
+ will be "freed".
+
+ It may also stop tracing if disable_on_free option is set.
tracing_cpumask:
This is a mask that lets the user only trace
- on specified CPUS. The format is a hex string
- representing the CPUS.
+ on specified CPUs. The format is a hex string
+ representing the CPUs.
set_ftrace_filter:
@@ -183,6 +218,261 @@ of ftrace. Here is a list of some of the key files:
"set_ftrace_notrace". (See the section "dynamic ftrace"
below for more details.)
+ enabled_functions:
+
+ This file is more for debugging ftrace, but can also be useful
+ in seeing if any function has a callback attached to it.
+ Not only does the trace infrastructure use ftrace function
+ trace utility, but other subsystems might too. This file
+ displays all functions that have a callback attached to them
+ as well as the number of callbacks that have been attached.
+ Note, a callback may also call multiple functions which will
+ not be listed in this count.
+
+ If the callback registered to be traced by a function with
+ the "save regs" attribute (thus even more overhead), a 'R'
+ will be displayed on the same line as the function that
+ is returning registers.
+
+ function_profile_enabled:
+
+ When set it will enable all functions with either the function
+ tracer, or if enabled, the function graph tracer. It will
+ keep a histogram of the number of functions that were called
+ and if run with the function graph tracer, it will also keep
+ track of the time spent in those functions. The histogram
+ content can be displayed in the files:
+
+ trace_stats/function<cpu> ( function0, function1, etc).
+
+ trace_stats:
+
+ A directory that holds different tracing stats.
+
+ kprobe_events:
+
+ Enable dynamic trace points. See kprobetrace.txt.
+
+ kprobe_profile:
+
+ Dynamic trace points stats. See kprobetrace.txt.
+
+ max_graph_depth:
+
+ Used with the function graph tracer. This is the max depth
+ it will trace into a function. Setting this to a value of
+ one will show only the first kernel function that is called
+ from user space.
+
+ printk_formats:
+
+ This is for tools that read the raw format files. If an event in
+ the ring buffer references a string (currently only trace_printk()
+ does this), only a pointer to the string is recorded into the buffer
+ and not the string itself. This prevents tools from knowing what
+ that string was. This file displays the string and address for
+ the string allowing tools to map the pointers to what the
+ strings were.
+
+ saved_cmdlines:
+
+ Only the pid of the task is recorded in a trace event unless
+ the event specifically saves the task comm as well. Ftrace
+ makes a cache of pid mappings to comms to try to display
+ comms for events. If a pid for a comm is not listed, then
+ "<...>" is displayed in the output.
+
+ snapshot:
+
+ This displays the "snapshot" buffer and also lets the user
+ take a snapshot of the current running trace.
+ See the "Snapshot" section below for more details.
+
+ stack_max_size:
+
+ When the stack tracer is activated, this will display the
+ maximum stack size it has encountered.
+ See the "Stack Trace" section below.
+
+ stack_trace:
+
+ This displays the stack back trace of the largest stack
+ that was encountered when the stack tracer is activated.
+ See the "Stack Trace" section below.
+
+ stack_trace_filter:
+
+ This is similar to "set_ftrace_filter" but it limits what
+ functions the stack tracer will check.
+
+ trace_clock:
+
+ Whenever an event is recorded into the ring buffer, a
+ "timestamp" is added. This stamp comes from a specified
+ clock. By default, ftrace uses the "local" clock. This
+ clock is very fast and strictly per cpu, but on some
+ systems it may not be monotonic with respect to other
+ CPUs. In other words, the local clocks may not be in sync
+ with local clocks on other CPUs.
+
+ Usual clocks for tracing:
+
+ # cat trace_clock
+ [local] global counter x86-tsc
+
+ local: Default clock, but may not be in sync across CPUs
+
+ global: This clock is in sync with all CPUs but may
+ be a bit slower than the local clock.
+
+ counter: This is not a clock at all, but literally an atomic
+ counter. It counts up one by one, but is in sync
+ with all CPUs. This is useful when you need to
+ know exactly the order events occurred with respect to
+ each other on different CPUs.
+
+ uptime: This uses the jiffies counter and the time stamp
+ is relative to the time since boot up.
+
+ perf: This makes ftrace use the same clock that perf uses.
+ Eventually perf will be able to read ftrace buffers
+ and this will help out in interleaving the data.
+
+ x86-tsc: Architectures may define their own clocks. For
+ example, x86 uses its own TSC cycle clock here.
+
+ To set a clock, simply echo the clock name into this file.
+
+ echo global > trace_clock
+
+ trace_marker:
+
+ This is a very useful file for synchronizing user space
+ with events happening in the kernel. Writing strings into
+ this file will be written into the ftrace buffer.
+
+ It is useful in applications to open this file at the start
+ of the application and just reference the file descriptor
+ for the file.
+
+ void trace_write(const char *fmt, ...)
+ {
+ va_list ap;
+ char buf[256];
+ int n;
+
+ if (trace_fd < 0)
+ return;
+
+ va_start(ap, fmt);
+ n = vsnprintf(buf, 256, fmt, ap);
+ va_end(ap);
+
+ write(trace_fd, buf, n);
+ }
+
+ start:
+
+ trace_fd = open("trace_marker", WR_ONLY);
+
+ uprobe_events:
+
+ Add dynamic tracepoints in programs.
+ See uprobetracer.txt
+
+ uprobe_profile:
+
+ Uprobe statistics. See uprobetrace.txt
+
+ instances:
+
+ This is a way to make multiple trace buffers where different
+ events can be recorded in different buffers.
+ See "Instances" section below.
+
+ events:
+
+ This is the trace event directory. It holds event tracepoints
+ (also known as static tracepoints) that have been compiled
+ into the kernel. It shows what event tracepoints exist
+ and how they are grouped by system. There are "enable"
+ files at various levels that can enable the tracepoints
+ when a "1" is written to them.
+
+ See events.txt for more information.
+
+ per_cpu:
+
+ This is a directory that contains the trace per_cpu information.
+
+ per_cpu/cpu0/buffer_size_kb:
+
+ The ftrace buffer is defined per_cpu. That is, there's a separate
+ buffer for each CPU to allow writes to be done atomically,
+ and free from cache bouncing. These buffers may have different
+ size buffers. This file is similar to the buffer_size_kb
+ file, but it only displays or sets the buffer size for the
+ specific CPU. (here cpu0).
+
+ per_cpu/cpu0/trace:
+
+ This is similar to the "trace" file, but it will only display
+ the data specific for the CPU. If written to, it only clears
+ the specific CPU buffer.
+
+ per_cpu/cpu0/trace_pipe
+
+ This is similar to the "trace_pipe" file, and is a consuming
+ read, but it will only display (and consume) the data specific
+ for the CPU.
+
+ per_cpu/cpu0/trace_pipe_raw
+
+ For tools that can parse the ftrace ring buffer binary format,
+ the trace_pipe_raw file can be used to extract the data
+ from the ring buffer directly. With the use of the splice()
+ system call, the buffer data can be quickly transferred to
+ a file or to the network where a server is collecting the
+ data.
+
+ Like trace_pipe, this is a consuming reader, where multiple
+ reads will always produce different data.
+
+ per_cpu/cpu0/snapshot:
+
+ This is similar to the main "snapshot" file, but will only
+ snapshot the current CPU (if supported). It only displays
+ the content of the snapshot for a given CPU, and if
+ written to, only clears this CPU buffer.
+
+ per_cpu/cpu0/snapshot_raw:
+
+ Similar to the trace_pipe_raw, but will read the binary format
+ from the snapshot buffer for the given CPU.
+
+ per_cpu/cpu0/stats:
+
+ This displays certain stats about the ring buffer:
+
+ entries: The number of events that are still in the buffer.
+
+ overrun: The number of lost events due to overwriting when
+ the buffer was full.
+
+ commit overrun: Should always be zero.
+ This gets set if so many events happened within a nested
+ event (ring buffer is re-entrant), that it fills the
+ buffer and starts dropping events.
+
+ bytes: Bytes actually read (not overwritten).
+
+ oldest event ts: The oldest timestamp in the buffer
+
+ now ts: The current timestamp
+
+ dropped events: Events lost due to overwrite option being off.
+
+ read events: The number of events read.
The Tracers
-----------
@@ -234,11 +524,6 @@ Here is the list of current tracers that may be configured.
RT tasks (as the current "wakeup" does). This is useful
for those interested in wake up timings of RT tasks.
- "hw-branch-tracer"
-
- Uses the BTS CPU feature on x86 CPUs to traces all
- branches executed.
-
"nop"
This is the "trace nothing" tracer. To remove all
@@ -261,70 +546,100 @@ Here is an example of the output format of the file "trace"
--------
# tracer: function
#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-4251 [01] 10152.583854: path_put <-path_walk
- bash-4251 [01] 10152.583855: dput <-path_put
- bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput
+# entries-in-buffer/entries-written: 140080/250280 #P:4
+#
+# _-----=> irqs-off
+# / _----=> need-resched
+# | / _---=> hardirq/softirq
+# || / _--=> preempt-depth
+# ||| / delay
+# TASK-PID CPU# |||| TIMESTAMP FUNCTION
+# | | | |||| | |
+ bash-1977 [000] .... 17284.993652: sys_close <-system_call_fastpath
+ bash-1977 [000] .... 17284.993653: __close_fd <-sys_close
+ bash-1977 [000] .... 17284.993653: _raw_spin_lock <-__close_fd
+ sshd-1974 [003] .... 17284.993653: __srcu_read_unlock <-fsnotify
+ bash-1977 [000] .... 17284.993654: add_preempt_count <-_raw_spin_lock
+ bash-1977 [000] ...1 17284.993655: _raw_spin_unlock <-__close_fd
+ bash-1977 [000] ...1 17284.993656: sub_preempt_count <-_raw_spin_unlock
+ bash-1977 [000] .... 17284.993657: filp_close <-__close_fd
+ bash-1977 [000] .... 17284.993657: dnotify_flush <-filp_close
+ sshd-1974 [003] .... 17284.993658: sys_select <-system_call_fastpath
--------
A header is printed with the tracer name that is represented by
-the trace. In this case the tracer is "function". Then a header
-showing the format. Task name "bash", the task PID "4251", the
-CPU that it was running on "01", the timestamp in <secs>.<usecs>
-format, the function name that was traced "path_put" and the
-parent function that called this function "path_walk". The
-timestamp is the time at which the function was entered.
+the trace. In this case the tracer is "function". Then it shows the
+number of events in the buffer as well as the total number of entries
+that were written. The difference is the number of entries that were
+lost due to the buffer filling up (250280 - 140080 = 110200 events
+lost).
+
+The header explains the content of the events. Task name "bash", the task
+PID "1977", the CPU that it was running on "000", the latency format
+(explained below), the timestamp in <secs>.<usecs> format, the
+function name that was traced "sys_close" and the parent function that
+called this function "system_call_fastpath". The timestamp is the time
+at which the function was entered.
Latency trace format
--------------------
-When the latency-format option is enabled, the trace file gives
-somewhat more information to see why a latency happened.
-Here is a typical trace.
+When the latency-format option is enabled or when one of the latency
+tracers is set, the trace file gives somewhat more information to see
+why a latency happened. Here is a typical trace.
# tracer: irqsoff
#
-irqsoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 97 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: apic_timer_interrupt
- => ended at: do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- <idle>-0 0d..1 0us+: trace_hardirqs_off_thunk (apic_timer_interrupt)
- <idle>-0 0d.s. 97us : __do_softirq (do_softirq)
- <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq)
+# irqsoff latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 259 us, #4/4, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: ps-6143 (uid:0 nice:0 policy:0 rt_prio:0)
+# -----------------
+# => started at: __lock_task_sighand
+# => ended at: _raw_spin_unlock_irqrestore
+#
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ ps-6143 2d... 0us!: trace_hardirqs_off <-__lock_task_sighand
+ ps-6143 2d..1 259us+: trace_hardirqs_on <-_raw_spin_unlock_irqrestore
+ ps-6143 2d..1 263us+: time_hardirqs_on <-_raw_spin_unlock_irqrestore
+ ps-6143 2d..1 306us : <stack trace>
+ => trace_hardirqs_on_caller
+ => trace_hardirqs_on
+ => _raw_spin_unlock_irqrestore
+ => do_task_stat
+ => proc_tgid_stat
+ => proc_single_show
+ => seq_read
+ => vfs_read
+ => sys_read
+ => system_call_fastpath
This shows that the current tracer is "irqsoff" tracing the time
-for which interrupts were disabled. It gives the trace version
-and the version of the kernel upon which this was executed on
-(2.6.26-rc8). Then it displays the max latency in microsecs (97
-us). The number of trace entries displayed and the total number
-recorded (both are three: #3/3). The type of preemption that was
-used (PREEMPT). VP, KP, SP, and HP are always zero and are
-reserved for later use. #P is the number of online CPUS (#P:2).
+for which interrupts were disabled. It gives the trace version (which
+never changes) and the version of the kernel upon which this was executed on
+(3.10). Then it displays the max latency in microseconds (259 us). The number
+of trace entries displayed and the total number (both are four: #4/4).
+VP, KP, SP, and HP are always zero and are reserved for later use.
+#P is the number of online CPUs (#P:4).
The task is the process that was running when the latency
-occurred. (swapper pid: 0).
+occurred. (ps pid: 6143).
The start and stop (the functions in which the interrupts were
disabled and enabled respectively) that caused the latencies:
- apic_timer_interrupt is where the interrupts were disabled.
- do_softirq is where they were enabled again.
+ __lock_task_sighand is where the interrupts were disabled.
+ _raw_spin_unlock_irqrestore is where they were enabled again.
The next lines after the header are the trace itself. The header
explains which is which.
@@ -367,16 +682,43 @@ The above is mostly meaningful for kernel developers.
The rest is the same as the 'trace' file.
+ Note, the latency tracers will usually end with a back trace
+ to easily find where the latency occurred.
trace_options
-------------
-The trace_options file is used to control what gets printed in
-the trace output. To see what is available, simply cat the file:
+The trace_options file (or the options directory) is used to control
+what gets printed in the trace output, or manipulate the tracers.
+To see what is available, simply cat the file:
cat trace_options
- print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
- noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj
+print-parent
+nosym-offset
+nosym-addr
+noverbose
+noraw
+nohex
+nobin
+noblock
+nostacktrace
+trace_printk
+noftrace_preempt
+nobranch
+annotate
+nouserstacktrace
+nosym-userobj
+noprintk-msg-only
+context-info
+latency-format
+sleep-time
+graph-time
+record-cmd
+overwrite
+nodisable_on_free
+irq-info
+markers
+function-trace
To disable one of the options, echo in the option prepended with
"no".
@@ -428,13 +770,34 @@ Here are the available options:
bin - This will print out the formats in raw binary.
- block - TBD (needs update)
+ block - When set, reading trace_pipe will not block when polled.
stacktrace - This is one of the options that changes the trace
itself. When a trace is recorded, so is the stack
of functions. This allows for back traces of
trace sites.
+ trace_printk - Can disable trace_printk() from writing into the buffer.
+
+ branch - Enable branch tracing with the tracer.
+
+ annotate - It is sometimes confusing when the CPU buffers are full
+ and one CPU buffer had a lot of events recently, thus
+ a shorter time frame, were another CPU may have only had
+ a few events, which lets it have older events. When
+ the trace is reported, it shows the oldest events first,
+ and it may look like only one CPU ran (the one with the
+ oldest events). When the annotate option is set, it will
+ display when a new CPU buffer started:
+
+ <idle>-0 [001] dNs4 21169.031481: wake_up_idle_cpu <-add_timer_on
+ <idle>-0 [001] dNs4 21169.031482: _raw_spin_unlock_irqrestore <-add_timer_on
+ <idle>-0 [001] .Ns4 21169.031484: sub_preempt_count <-_raw_spin_unlock_irqrestore
+##### CPU 2 buffer started ####
+ <idle>-0 [002] .N.1 21169.031484: rcu_idle_exit <-cpu_idle
+ <idle>-0 [001] .Ns3 21169.031484: _raw_spin_unlock <-clocksource_watchdog
+ <idle>-0 [001] .Ns3 21169.031485: sub_preempt_count <-_raw_spin_unlock
+
userstacktrace - This option changes the trace. It records a
stacktrace of the current userspace thread.
@@ -451,9 +814,13 @@ Here are the available options:
a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
- sched-tree - trace all tasks that are on the runqueue, at
- every scheduling event. Will add overhead if
- there's a lot of tasks running at once.
+
+ printk-msg-only - When set, trace_printk()s will only show the format
+ and not their parameters (if trace_bprintk() or
+ trace_bputs() was used to save the trace_printk()).
+
+ context-info - Show only the event data. Hides the comm, PID,
+ timestamp, CPU, and other useful data.
latency-format - This option changes the trace. When
it is enabled, the trace displays
@@ -461,31 +828,61 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
latencies, as described in "Latency
trace format".
+ sleep-time - When running function graph tracer, to include
+ the time a task schedules out in its function.
+ When enabled, it will account time the task has been
+ scheduled out as part of the function call.
+
+ graph-time - When running function graph tracer, to include the
+ time to call nested functions. When this is not set,
+ the time reported for the function will only include
+ the time the function itself executed for, not the time
+ for functions that it called.
+
+ record-cmd - When any event or tracer is enabled, a hook is enabled
+ in the sched_switch trace point to fill comm cache
+ with mapped pids and comms. But this may cause some
+ overhead, and if you only care about pids, and not the
+ name of the task, disabling this option can lower the
+ impact of tracing.
+
overwrite - This controls what happens when the trace buffer is
full. If "1" (default), the oldest events are
discarded and overwritten. If "0", then the newest
events are discarded.
+ (see per_cpu/cpu0/stats for overrun and dropped)
-ftrace_enabled
---------------
+ disable_on_free - When the free_buffer is closed, tracing will
+ stop (tracing_on set to 0).
-The following tracers (listed below) give different output
-depending on whether or not the sysctl ftrace_enabled is set. To
-set ftrace_enabled, one can either use the sysctl function or
-set it via the proc file system interface.
+ irq-info - Shows the interrupt, preempt count, need resched data.
+ When disabled, the trace looks like:
- sysctl kernel.ftrace_enabled=1
+# tracer: function
+#
+# entries-in-buffer/entries-written: 144405/9452052 #P:4
+#
+# TASK-PID CPU# TIMESTAMP FUNCTION
+# | | | | |
+ <idle>-0 [002] 23636.756054: ttwu_do_activate.constprop.89 <-try_to_wake_up
+ <idle>-0 [002] 23636.756054: activate_task <-ttwu_do_activate.constprop.89
+ <idle>-0 [002] 23636.756055: enqueue_task <-activate_task
- or
- echo 1 > /proc/sys/kernel/ftrace_enabled
+ markers - When set, the trace_marker is writable (only by root).
+ When disabled, the trace_marker will error with EINVAL
+ on write.
+
+
+ function-trace - The latency tracers will enable function tracing
+ if this option is enabled (default it is). When
+ it is disabled, the latency tracers do not trace
+ functions. This keeps the overhead of the tracer down
+ when performing latency tests.
-To disable ftrace_enabled simply replace the '1' with '0' in the
-above commands.
+ Note: Some tracers have their own options. They only appear
+ when the tracer is active.
-When ftrace_enabled is set the tracers will also record the
-functions that are within the trace. The descriptions of the
-tracers will also show an example with ftrace enabled.
irqsoff
@@ -506,95 +903,133 @@ new trace is saved.
To reset the maximum, echo 0 into tracing_max_latency. Here is
an example:
+ # echo 0 > options/function-trace
# echo irqsoff > current_tracer
- # echo latency-format > trace_options
- # echo 0 > tracing_max_latency
# echo 1 > tracing_on
+ # echo 0 > tracing_max_latency
# ls -ltr
[...]
# echo 0 > tracing_on
# cat trace
# tracer: irqsoff
#
-irqsoff latency trace v1.1.5 on 2.6.26
---------------------------------------------------------------------
- latency: 12 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: bash-3730 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: sys_setpgid
- => ended at: sys_setpgid
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- bash-3730 1d... 0us : _write_lock_irq (sys_setpgid)
- bash-3730 1d..1 1us+: _write_unlock_irq (sys_setpgid)
- bash-3730 1d..2 14us : trace_hardirqs_on (sys_setpgid)
-
-
-Here we see that that we had a latency of 12 microsecs (which is
-very good). The _write_lock_irq in sys_setpgid disabled
-interrupts. The difference between the 12 and the displayed
-timestamp 14us occurred because the clock was incremented
+# irqsoff latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 16 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: swapper/0-0 (uid:0 nice:0 policy:0 rt_prio:0)
+# -----------------
+# => started at: run_timer_softirq
+# => ended at: run_timer_softirq
+#
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ <idle>-0 0d.s2 0us+: _raw_spin_lock_irq <-run_timer_softirq
+ <idle>-0 0dNs3 17us : _raw_spin_unlock_irq <-run_timer_softirq
+ <idle>-0 0dNs3 17us+: trace_hardirqs_on <-run_timer_softirq
+ <idle>-0 0dNs3 25us : <stack trace>
+ => _raw_spin_unlock_irq
+ => run_timer_softirq
+ => __do_softirq
+ => call_softirq
+ => do_softirq
+ => irq_exit
+ => smp_apic_timer_interrupt
+ => apic_timer_interrupt
+ => rcu_idle_exit
+ => cpu_idle
+ => rest_init
+ => start_kernel
+ => x86_64_start_reservations
+ => x86_64_start_kernel
+
+Here we see that that we had a latency of 16 microseconds (which is
+very good). The _raw_spin_lock_irq in run_timer_softirq disabled
+interrupts. The difference between the 16 and the displayed
+timestamp 25us occurred because the clock was incremented
between the time of recording the max latency and the time of
recording the function that had that latency.
-Note the above example had ftrace_enabled not set. If we set the
-ftrace_enabled, we get a much larger output:
+Note the above example had function-trace not set. If we set
+function-trace, we get a much larger output:
+
+ with echo 1 > options/function-trace
# tracer: irqsoff
#
-irqsoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 50 us, #101/101, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: ls-4339 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: __alloc_pages_internal
- => ended at: __alloc_pages_internal
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- ls-4339 0...1 0us+: get_page_from_freelist (__alloc_pages_internal)
- ls-4339 0d..1 3us : rmqueue_bulk (get_page_from_freelist)
- ls-4339 0d..1 3us : _spin_lock (rmqueue_bulk)
- ls-4339 0d..1 4us : add_preempt_count (_spin_lock)
- ls-4339 0d..2 4us : __rmqueue (rmqueue_bulk)
- ls-4339 0d..2 5us : __rmqueue_smallest (__rmqueue)
- ls-4339 0d..2 5us : __mod_zone_page_state (__rmqueue_smallest)
- ls-4339 0d..2 6us : __rmqueue (rmqueue_bulk)
- ls-4339 0d..2 6us : __rmqueue_smallest (__rmqueue)
- ls-4339 0d..2 7us : __mod_zone_page_state (__rmqueue_smallest)
- ls-4339 0d..2 7us : __rmqueue (rmqueue_bulk)
- ls-4339 0d..2 8us : __rmqueue_smallest (__rmqueue)
+# irqsoff latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 71 us, #168/168, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: bash-2042 (uid:0 nice:0 policy:0 rt_prio:0)
+# -----------------
+# => started at: ata_scsi_queuecmd
+# => ended at: ata_scsi_queuecmd
+#
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ bash-2042 3d... 0us : _raw_spin_lock_irqsave <-ata_scsi_queuecmd
+ bash-2042 3d... 0us : add_preempt_count <-_raw_spin_lock_irqsave
+ bash-2042 3d..1 1us : ata_scsi_find_dev <-ata_scsi_queuecmd
+ bash-2042 3d..1 1us : __ata_scsi_find_dev <-ata_scsi_find_dev
+ bash-2042 3d..1 2us : ata_find_dev.part.14 <-__ata_scsi_find_dev
+ bash-2042 3d..1 2us : ata_qc_new_init <-__ata_scsi_queuecmd
+ bash-2042 3d..1 3us : ata_sg_init <-__ata_scsi_queuecmd
+ bash-2042 3d..1 4us : ata_scsi_rw_xlat <-__ata_scsi_queuecmd
+ bash-2042 3d..1 4us : ata_build_rw_tf <-ata_scsi_rw_xlat
[...]
- ls-4339 0d..2 46us : __rmqueue_smallest (__rmqueue)
- ls-4339 0d..2 47us : __mod_zone_page_state (__rmqueue_smallest)
- ls-4339 0d..2 47us : __rmqueue (rmqueue_bulk)
- ls-4339 0d..2 48us : __rmqueue_smallest (__rmqueue)
- ls-4339 0d..2 48us : __mod_zone_page_state (__rmqueue_smallest)
- ls-4339 0d..2 49us : _spin_unlock (rmqueue_bulk)
- ls-4339 0d..2 49us : sub_preempt_count (_spin_unlock)
- ls-4339 0d..1 50us : get_page_from_freelist (__alloc_pages_internal)
- ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal)
-
-
-
-Here we traced a 50 microsecond latency. But we also see all the
+ bash-2042 3d..1 67us : delay_tsc <-__delay
+ bash-2042 3d..1 67us : add_preempt_count <-delay_tsc
+ bash-2042 3d..2 67us : sub_preempt_count <-delay_tsc
+ bash-2042 3d..1 67us : add_preempt_count <-delay_tsc
+ bash-2042 3d..2 68us : sub_preempt_count <-delay_tsc
+ bash-2042 3d..1 68us+: ata_bmdma_start <-ata_bmdma_qc_issue
+ bash-2042 3d..1 71us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd
+ bash-2042 3d..1 71us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd
+ bash-2042 3d..1 72us+: trace_hardirqs_on <-ata_scsi_queuecmd
+ bash-2042 3d..1 120us : <stack trace>
+ => _raw_spin_unlock_irqrestore
+ => ata_scsi_queuecmd
+ => scsi_dispatch_cmd
+ => scsi_request_fn
+ => __blk_run_queue_uncond
+ => __blk_run_queue
+ => blk_queue_bio
+ => generic_make_request
+ => submit_bio
+ => submit_bh
+ => __ext3_get_inode_loc
+ => ext3_iget
+ => ext3_lookup
+ => lookup_real
+ => __lookup_hash
+ => walk_component
+ => lookup_last
+ => path_lookupat
+ => filename_lookup
+ => user_path_at_empty
+ => user_path_at
+ => vfs_fstatat
+ => vfs_stat
+ => sys_newstat
+ => system_call_fastpath
+
+
+Here we traced a 71 microsecond latency. But we also see all the
functions that were called during that time. Note that by
enabling function tracing, we incur an added overhead. This
overhead may extend the latency times. But nevertheless, this
@@ -614,120 +1049,122 @@ Like the irqsoff tracer, it records the maximum latency for
which preemption was disabled. The control of preemptoff tracer
is much like the irqsoff tracer.
+ # echo 0 > options/function-trace
# echo preemptoff > current_tracer
- # echo latency-format > trace_options
- # echo 0 > tracing_max_latency
# echo 1 > tracing_on
+ # echo 0 > tracing_max_latency
# ls -ltr
[...]
# echo 0 > tracing_on
# cat trace
# tracer: preemptoff
#
-preemptoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 29 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: do_IRQ
- => ended at: __do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- sshd-4261 0d.h. 0us+: irq_enter (do_IRQ)
- sshd-4261 0d.s. 29us : _local_bh_enable (__do_softirq)
- sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq)
+# preemptoff latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 46 us, #4/4, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: sshd-1991 (uid:0 nice:0 policy:0 rt_prio:0)
+# -----------------
+# => started at: do_IRQ
+# => ended at: do_IRQ
+#
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ sshd-1991 1d.h. 0us+: irq_enter <-do_IRQ
+ sshd-1991 1d..1 46us : irq_exit <-do_IRQ
+ sshd-1991 1d..1 47us+: trace_preempt_on <-do_IRQ
+ sshd-1991 1d..1 52us : <stack trace>
+ => sub_preempt_count
+ => irq_exit
+ => do_IRQ
+ => ret_from_intr
This has some more changes. Preemption was disabled when an
-interrupt came in (notice the 'h'), and was enabled while doing
-a softirq. (notice the 's'). But we also see that interrupts
-have been disabled when entering the preempt off section and
-leaving it (the 'd'). We do not know if interrupts were enabled
-in the mean time.
+interrupt came in (notice the 'h'), and was enabled on exit.
+But we also see that interrupts have been disabled when entering
+the preempt off section and leaving it (the 'd'). We do not know if
+interrupts were enabled in the mean time or shortly after this
+was over.
# tracer: preemptoff
#
-preemptoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 63 us, #87/87, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: remove_wait_queue
- => ended at: __do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- sshd-4261 0d..1 0us : _spin_lock_irqsave (remove_wait_queue)
- sshd-4261 0d..1 1us : _spin_unlock_irqrestore (remove_wait_queue)
- sshd-4261 0d..1 2us : do_IRQ (common_interrupt)
- sshd-4261 0d..1 2us : irq_enter (do_IRQ)
- sshd-4261 0d..1 2us : idle_cpu (irq_enter)
- sshd-4261 0d..1 3us : add_preempt_count (irq_enter)
- sshd-4261 0d.h1 3us : idle_cpu (irq_enter)
- sshd-4261 0d.h. 4us : handle_fasteoi_irq (do_IRQ)
+# preemptoff latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 83 us, #241/241, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: bash-1994 (uid:0 nice:0 policy:0 rt_prio:0)
+# -----------------
+# => started at: wake_up_new_task
+# => ended at: task_rq_unlock
+#
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ bash-1994 1d..1 0us : _raw_spin_lock_irqsave <-wake_up_new_task
+ bash-1994 1d..1 0us : select_task_rq_fair <-select_task_rq
+ bash-1994 1d..1 1us : __rcu_read_lock <-select_task_rq_fair
+ bash-1994 1d..1 1us : source_load <-select_task_rq_fair
+ bash-1994 1d..1 1us : source_load <-select_task_rq_fair
[...]
- sshd-4261 0d.h. 12us : add_preempt_count (_spin_lock)
- sshd-4261 0d.h1 12us : ack_ioapic_quirk_irq (handle_fasteoi_irq)
- sshd-4261 0d.h1 13us : move_native_irq (ack_ioapic_quirk_irq)
- sshd-4261 0d.h1 13us : _spin_unlock (handle_fasteoi_irq)
- sshd-4261 0d.h1 14us : sub_preempt_count (_spin_unlock)
- sshd-4261 0d.h1 14us : irq_exit (do_IRQ)
- sshd-4261 0d.h1 15us : sub_preempt_count (irq_exit)
- sshd-4261 0d..2 15us : do_softirq (irq_exit)
- sshd-4261 0d... 15us : __do_softirq (do_softirq)
- sshd-4261 0d... 16us : __local_bh_disable (__do_softirq)
- sshd-4261 0d... 16us+: add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s4 20us : add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s4 21us : sub_preempt_count (local_bh_enable)
- sshd-4261 0d.s5 21us : sub_preempt_count (local_bh_enable)
+ bash-1994 1d..1 12us : irq_enter <-smp_apic_timer_interrupt
+ bash-1994 1d..1 12us : rcu_irq_enter <-irq_enter
+ bash-1994 1d..1 13us : add_preempt_count <-irq_enter
+ bash-1994 1d.h1 13us : exit_idle <-smp_apic_timer_interrupt
+ bash-1994 1d.h1 13us : hrtimer_interrupt <-smp_apic_timer_interrupt
+ bash-1994 1d.h1 13us : _raw_spin_lock <-hrtimer_interrupt
+ bash-1994 1d.h1 14us : add_preempt_count <-_raw_spin_lock
+ bash-1994 1d.h2 14us : ktime_get_update_offsets <-hrtimer_interrupt
[...]
- sshd-4261 0d.s6 41us : add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s6 42us : sub_preempt_count (local_bh_enable)
- sshd-4261 0d.s7 42us : sub_preempt_count (local_bh_enable)
- sshd-4261 0d.s5 43us : add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s5 43us : sub_preempt_count (local_bh_enable_ip)
- sshd-4261 0d.s6 44us : sub_preempt_count (local_bh_enable_ip)
- sshd-4261 0d.s5 44us : add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s5 45us : sub_preempt_count (local_bh_enable)
+ bash-1994 1d.h1 35us : lapic_next_event <-clockevents_program_event
+ bash-1994 1d.h1 35us : irq_exit <-smp_apic_timer_interrupt
+ bash-1994 1d.h1 36us : sub_preempt_count <-irq_exit
+ bash-1994 1d..2 36us : do_softirq <-irq_exit
+ bash-1994 1d..2 36us : __do_softirq <-call_softirq
+ bash-1994 1d..2 36us : __local_bh_disable <-__do_softirq
+ bash-1994 1d.s2 37us : add_preempt_count <-_raw_spin_lock_irq
+ bash-1994 1d.s3 38us : _raw_spin_unlock <-run_timer_softirq
+ bash-1994 1d.s3 39us : sub_preempt_count <-_raw_spin_unlock
+ bash-1994 1d.s2 39us : call_timer_fn <-run_timer_softirq
[...]
- sshd-4261 0d.s. 63us : _local_bh_enable (__do_softirq)
- sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq)
+ bash-1994 1dNs2 81us : cpu_needs_another_gp <-rcu_process_callbacks
+ bash-1994 1dNs2 82us : __local_bh_enable <-__do_softirq
+ bash-1994 1dNs2 82us : sub_preempt_count <-__local_bh_enable
+ bash-1994 1dN.2 82us : idle_cpu <-irq_exit
+ bash-1994 1dN.2 83us : rcu_irq_exit <-irq_exit
+ bash-1994 1dN.2 83us : sub_preempt_count <-irq_exit
+ bash-1994 1.N.1 84us : _raw_spin_unlock_irqrestore <-task_rq_unlock
+ bash-1994 1.N.1 84us+: trace_preempt_on <-task_rq_unlock
+ bash-1994 1.N.1 104us : <stack trace>
+ => sub_preempt_count
+ => _raw_spin_unlock_irqrestore
+ => task_rq_unlock
+ => wake_up_new_task
+ => do_fork
+ => sys_clone
+ => stub_clone
The above is an example of the preemptoff trace with
-ftrace_enabled set. Here we see that interrupts were disabled
+function-trace set. Here we see that interrupts were not disabled
the entire time. The irq_enter code lets us know that we entered
an interrupt 'h'. Before that, the functions being traced still
show that it is not in an interrupt, but we can see from the
functions themselves that this is not the case.
-Notice that __do_softirq when called does not have a
-preempt_count. It may seem that we missed a preempt enabling.
-What really happened is that the preempt count is held on the
-thread's stack and we switched to the softirq stack (4K stacks
-in effect). The code does not copy the preempt count, but
-because interrupts are disabled, we do not need to worry about
-it. Having a tracer like this is good for letting people know
-what really happens inside the kernel.
-
-
preemptirqsoff
--------------
@@ -762,38 +1199,57 @@ tracer.
Again, using this trace is much like the irqsoff and preemptoff
tracers.
+ # echo 0 > options/function-trace
# echo preemptirqsoff > current_tracer
- # echo latency-format > trace_options
- # echo 0 > tracing_max_latency
# echo 1 > tracing_on
+ # echo 0 > tracing_max_latency
# ls -ltr
[...]
# echo 0 > tracing_on
# cat trace
# tracer: preemptirqsoff
#
-preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 293 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: ls-4860 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: apic_timer_interrupt
- => ended at: __do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- ls-4860 0d... 0us!: trace_hardirqs_off_thunk (apic_timer_interrupt)
- ls-4860 0d.s. 294us : _local_bh_enable (__do_softirq)
- ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq)
-
+# preemptirqsoff latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 100 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: ls-2230 (uid:0 nice:0 policy:0 rt_prio:0)
+# -----------------
+# => started at: ata_scsi_queuecmd
+# => ended at: ata_scsi_queuecmd
+#
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ ls-2230 3d... 0us+: _raw_spin_lock_irqsave <-ata_scsi_queuecmd
+ ls-2230 3...1 100us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd
+ ls-2230 3...1 101us+: trace_preempt_on <-ata_scsi_queuecmd
+ ls-2230 3...1 111us : <stack trace>
+ => sub_preempt_count
+ => _raw_spin_unlock_irqrestore
+ => ata_scsi_queuecmd
+ => scsi_dispatch_cmd
+ => scsi_request_fn
+ => __blk_run_queue_uncond
+ => __blk_run_queue
+ => blk_queue_bio
+ => generic_make_request
+ => submit_bio
+ => submit_bh
+ => ext3_bread
+ => ext3_dir_bread
+ => htree_dirblock_to_tree
+ => ext3_htree_fill_tree
+ => ext3_readdir
+ => vfs_readdir
+ => sys_getdents
+ => system_call_fastpath
The trace_hardirqs_off_thunk is called from assembly on x86 when
@@ -802,105 +1258,158 @@ function tracing, we do not know if interrupts were enabled
within the preemption points. We do see that it started with
preemption enabled.
-Here is a trace with ftrace_enabled set:
-
+Here is a trace with function-trace set:
# tracer: preemptirqsoff
#
-preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 105 us, #183/183, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: write_chan
- => ended at: __do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- ls-4473 0.N.. 0us : preempt_schedule (write_chan)
- ls-4473 0dN.1 1us : _spin_lock (schedule)
- ls-4473 0dN.1 2us : add_preempt_count (_spin_lock)
- ls-4473 0d..2 2us : put_prev_task_fair (schedule)
-[...]
- ls-4473 0d..2 13us : set_normalized_timespec (ktime_get_ts)
- ls-4473 0d..2 13us : __switch_to (schedule)
- sshd-4261 0d..2 14us : finish_task_switch (schedule)
- sshd-4261 0d..2 14us : _spin_unlock_irq (finish_task_switch)
- sshd-4261 0d..1 15us : add_preempt_count (_spin_lock_irqsave)
- sshd-4261 0d..2 16us : _spin_unlock_irqrestore (hrtick_set)
- sshd-4261 0d..2 16us : do_IRQ (common_interrupt)
- sshd-4261 0d..2 17us : irq_enter (do_IRQ)
- sshd-4261 0d..2 17us : idle_cpu (irq_enter)
- sshd-4261 0d..2 18us : add_preempt_count (irq_enter)
- sshd-4261 0d.h2 18us : idle_cpu (irq_enter)
- sshd-4261 0d.h. 18us : handle_fasteoi_irq (do_IRQ)
- sshd-4261 0d.h. 19us : _spin_lock (handle_fasteoi_irq)
- sshd-4261 0d.h. 19us : add_preempt_count (_spin_lock)
- sshd-4261 0d.h1 20us : _spin_unlock (handle_fasteoi_irq)
- sshd-4261 0d.h1 20us : sub_preempt_count (_spin_unlock)
-[...]
- sshd-4261 0d.h1 28us : _spin_unlock (handle_fasteoi_irq)
- sshd-4261 0d.h1 29us : sub_preempt_count (_spin_unlock)
- sshd-4261 0d.h2 29us : irq_exit (do_IRQ)
- sshd-4261 0d.h2 29us : sub_preempt_count (irq_exit)
- sshd-4261 0d..3 30us : do_softirq (irq_exit)
- sshd-4261 0d... 30us : __do_softirq (do_softirq)
- sshd-4261 0d... 31us : __local_bh_disable (__do_softirq)
- sshd-4261 0d... 31us+: add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s4 34us : add_preempt_count (__local_bh_disable)
+# preemptirqsoff latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 161 us, #339/339, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: ls-2269 (uid:0 nice:0 policy:0 rt_prio:0)
+# -----------------
+# => started at: schedule
+# => ended at: mutex_unlock
+#
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+kworker/-59 3...1 0us : __schedule <-schedule
+kworker/-59 3d..1 0us : rcu_preempt_qs <-rcu_note_context_switch
+kworker/-59 3d..1 1us : add_preempt_count <-_raw_spin_lock_irq
+kworker/-59 3d..2 1us : deactivate_task <-__schedule
+kworker/-59 3d..2 1us : dequeue_task <-deactivate_task
+kworker/-59 3d..2 2us : update_rq_clock <-dequeue_task
+kworker/-59 3d..2 2us : dequeue_task_fair <-dequeue_task
+kworker/-59 3d..2 2us : update_curr <-dequeue_task_fair
+kworker/-59 3d..2 2us : update_min_vruntime <-update_curr
+kworker/-59 3d..2 3us : cpuacct_charge <-update_curr
+kworker/-59 3d..2 3us : __rcu_read_lock <-cpuacct_charge
+kworker/-59 3d..2 3us : __rcu_read_unlock <-cpuacct_charge
+kworker/-59 3d..2 3us : update_cfs_rq_blocked_load <-dequeue_task_fair
+kworker/-59 3d..2 4us : clear_buddies <-dequeue_task_fair
+kworker/-59 3d..2 4us : account_entity_dequeue <-dequeue_task_fair
+kworker/-59 3d..2 4us : update_min_vruntime <-dequeue_task_fair
+kworker/-59 3d..2 4us : update_cfs_shares <-dequeue_task_fair
+kworker/-59 3d..2 5us : hrtick_update <-dequeue_task_fair
+kworker/-59 3d..2 5us : wq_worker_sleeping <-__schedule
+kworker/-59 3d..2 5us : kthread_data <-wq_worker_sleeping
+kworker/-59 3d..2 5us : put_prev_task_fair <-__schedule
+kworker/-59 3d..2 6us : pick_next_task_fair <-pick_next_task
+kworker/-59 3d..2 6us : clear_buddies <-pick_next_task_fair
+kworker/-59 3d..2 6us : set_next_entity <-pick_next_task_fair
+kworker/-59 3d..2 6us : update_stats_wait_end <-set_next_entity
+ ls-2269 3d..2 7us : finish_task_switch <-__schedule
+ ls-2269 3d..2 7us : _raw_spin_unlock_irq <-finish_task_switch
+ ls-2269 3d..2 8us : do_IRQ <-ret_from_intr
+ ls-2269 3d..2 8us : irq_enter <-do_IRQ
+ ls-2269 3d..2 8us : rcu_irq_enter <-irq_enter
+ ls-2269 3d..2 9us : add_preempt_count <-irq_enter
+ ls-2269 3d.h2 9us : exit_idle <-do_IRQ
[...]
- sshd-4261 0d.s3 43us : sub_preempt_count (local_bh_enable_ip)
- sshd-4261 0d.s4 44us : sub_preempt_count (local_bh_enable_ip)
- sshd-4261 0d.s3 44us : smp_apic_timer_interrupt (apic_timer_interrupt)
- sshd-4261 0d.s3 45us : irq_enter (smp_apic_timer_interrupt)
- sshd-4261 0d.s3 45us : idle_cpu (irq_enter)
- sshd-4261 0d.s3 46us : add_preempt_count (irq_enter)
- sshd-4261 0d.H3 46us : idle_cpu (irq_enter)
- sshd-4261 0d.H3 47us : hrtimer_interrupt (smp_apic_timer_interrupt)
- sshd-4261 0d.H3 47us : ktime_get (hrtimer_interrupt)
+ ls-2269 3d.h3 20us : sub_preempt_count <-_raw_spin_unlock
+ ls-2269 3d.h2 20us : irq_exit <-do_IRQ
+ ls-2269 3d.h2 21us : sub_preempt_count <-irq_exit
+ ls-2269 3d..3 21us : do_softirq <-irq_exit
+ ls-2269 3d..3 21us : __do_softirq <-call_softirq
+ ls-2269 3d..3 21us+: __local_bh_disable <-__do_softirq
+ ls-2269 3d.s4 29us : sub_preempt_count <-_local_bh_enable_ip
+ ls-2269 3d.s5 29us : sub_preempt_count <-_local_bh_enable_ip
+ ls-2269 3d.s5 31us : do_IRQ <-ret_from_intr
+ ls-2269 3d.s5 31us : irq_enter <-do_IRQ
+ ls-2269 3d.s5 31us : rcu_irq_enter <-irq_enter
[...]
- sshd-4261 0d.H3 81us : tick_program_event (hrtimer_interrupt)
- sshd-4261 0d.H3 82us : ktime_get (tick_program_event)
- sshd-4261 0d.H3 82us : ktime_get_ts (ktime_get)
- sshd-4261 0d.H3 83us : getnstimeofday (ktime_get_ts)
- sshd-4261 0d.H3 83us : set_normalized_timespec (ktime_get_ts)
- sshd-4261 0d.H3 84us : clockevents_program_event (tick_program_event)
- sshd-4261 0d.H3 84us : lapic_next_event (clockevents_program_event)
- sshd-4261 0d.H3 85us : irq_exit (smp_apic_timer_interrupt)
- sshd-4261 0d.H3 85us : sub_preempt_count (irq_exit)
- sshd-4261 0d.s4 86us : sub_preempt_count (irq_exit)
- sshd-4261 0d.s3 86us : add_preempt_count (__local_bh_disable)
+ ls-2269 3d.s5 31us : rcu_irq_enter <-irq_enter
+ ls-2269 3d.s5 32us : add_preempt_count <-irq_enter
+ ls-2269 3d.H5 32us : exit_idle <-do_IRQ
+ ls-2269 3d.H5 32us : handle_irq <-do_IRQ
+ ls-2269 3d.H5 32us : irq_to_desc <-handle_irq
+ ls-2269 3d.H5 33us : handle_fasteoi_irq <-handle_irq
[...]
- sshd-4261 0d.s1 98us : sub_preempt_count (net_rx_action)
- sshd-4261 0d.s. 99us : add_preempt_count (_spin_lock_irq)
- sshd-4261 0d.s1 99us+: _spin_unlock_irq (run_timer_softirq)
- sshd-4261 0d.s. 104us : _local_bh_enable (__do_softirq)
- sshd-4261 0d.s. 104us : sub_preempt_count (_local_bh_enable)
- sshd-4261 0d.s. 105us : _local_bh_enable (__do_softirq)
- sshd-4261 0d.s1 105us : trace_preempt_on (__do_softirq)
-
-
-This is a very interesting trace. It started with the preemption
-of the ls task. We see that the task had the "need_resched" bit
-set via the 'N' in the trace. Interrupts were disabled before
-the spin_lock at the beginning of the trace. We see that a
-schedule took place to run sshd. When the interrupts were
-enabled, we took an interrupt. On return from the interrupt
-handler, the softirq ran. We took another interrupt while
-running the softirq as we see from the capital 'H'.
+ ls-2269 3d.s5 158us : _raw_spin_unlock_irqrestore <-rtl8139_poll
+ ls-2269 3d.s3 158us : net_rps_action_and_irq_enable.isra.65 <-net_rx_action
+ ls-2269 3d.s3 159us : __local_bh_enable <-__do_softirq
+ ls-2269 3d.s3 159us : sub_preempt_count <-__local_bh_enable
+ ls-2269 3d..3 159us : idle_cpu <-irq_exit
+ ls-2269 3d..3 159us : rcu_irq_exit <-irq_exit
+ ls-2269 3d..3 160us : sub_preempt_count <-irq_exit
+ ls-2269 3d... 161us : __mutex_unlock_slowpath <-mutex_unlock
+ ls-2269 3d... 162us+: trace_hardirqs_on <-mutex_unlock
+ ls-2269 3d... 186us : <stack trace>
+ => __mutex_unlock_slowpath
+ => mutex_unlock
+ => process_output
+ => n_tty_write
+ => tty_write
+ => vfs_write
+ => sys_write
+ => system_call_fastpath
+
+This is an interesting trace. It started with kworker running and
+scheduling out and ls taking over. But as soon as ls released the
+rq lock and enabled interrupts (but not preemption) an interrupt
+triggered. When the interrupt finished, it started running softirqs.
+But while the softirq was running, another interrupt triggered.
+When an interrupt is running inside a softirq, the annotation is 'H'.
wakeup
------
+One common case that people are interested in tracing is the
+time it takes for a task that is woken to actually wake up.
+Now for non Real-Time tasks, this can be arbitrary. But tracing
+it none the less can be interesting.
+
+Without function tracing:
+
+ # echo 0 > options/function-trace
+ # echo wakeup > current_tracer
+ # echo 1 > tracing_on
+ # echo 0 > tracing_max_latency
+ # chrt -f 5 sleep 1
+ # echo 0 > tracing_on
+ # cat trace
+# tracer: wakeup
+#
+# wakeup latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 15 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: kworker/3:1H-312 (uid:0 nice:-20 policy:0 rt_prio:0)
+# -----------------
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ <idle>-0 3dNs7 0us : 0:120:R + [003] 312:100:R kworker/3:1H
+ <idle>-0 3dNs7 1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up
+ <idle>-0 3d..3 15us : __schedule <-schedule
+ <idle>-0 3d..3 15us : 0:120:R ==> [003] 312:100:R kworker/3:1H
+
+The tracer only traces the highest priority task in the system
+to avoid tracing the normal circumstances. Here we see that
+the kworker with a nice priority of -20 (not very nice), took
+just 15 microseconds from the time it woke up, to the time it
+ran.
+
+Non Real-Time tasks are not that interesting. A more interesting
+trace is to concentrate only on Real-Time tasks.
+
+wakeup_rt
+---------
+
In a Real-Time environment it is very important to know the
wakeup time it takes for the highest priority task that is woken
up to the time that it executes. This is also known as "schedule
@@ -914,124 +1423,229 @@ Real-Time environments are interested in the worst case latency.
That is the longest latency it takes for something to happen,
and not the average. We can have a very fast scheduler that may
only have a large latency once in a while, but that would not
-work well with Real-Time tasks. The wakeup tracer was designed
+work well with Real-Time tasks. The wakeup_rt tracer was designed
to record the worst case wakeups of RT tasks. Non-RT tasks are
not recorded because the tracer only records one worst case and
tracing non-RT tasks that are unpredictable will overwrite the
-worst case latency of RT tasks.
+worst case latency of RT tasks (just run the normal wakeup
+tracer for a while to see that effect).
Since this tracer only deals with RT tasks, we will run this
slightly differently than we did with the previous tracers.
Instead of performing an 'ls', we will run 'sleep 1' under
'chrt' which changes the priority of the task.
- # echo wakeup > current_tracer
- # echo latency-format > trace_options
- # echo 0 > tracing_max_latency
+ # echo 0 > options/function-trace
+ # echo wakeup_rt > current_tracer
# echo 1 > tracing_on
+ # echo 0 > tracing_max_latency
# chrt -f 5 sleep 1
# echo 0 > tracing_on
# cat trace
# tracer: wakeup
#
-wakeup latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 4 us, #2/2, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sleep-4901 (uid:0 nice:0 policy:1 rt_prio:5)
- -----------------
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- <idle>-0 1d.h4 0us+: try_to_wake_up (wake_up_process)
- <idle>-0 1d..4 4us : schedule (cpu_idle)
-
-
-Running this on an idle system, we see that it only took 4
-microseconds to perform the task switch. Note, since the trace
-marker in the schedule is before the actual "switch", we stop
-the tracing when the recorded task is about to schedule in. This
-may change if we add a new marker at the end of the scheduler.
-
-Notice that the recorded task is 'sleep' with the PID of 4901
+# tracer: wakeup_rt
+#
+# wakeup_rt latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 5 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: sleep-2389 (uid:0 nice:0 policy:1 rt_prio:5)
+# -----------------
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ <idle>-0 3d.h4 0us : 0:120:R + [003] 2389: 94:R sleep
+ <idle>-0 3d.h4 1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up
+ <idle>-0 3d..3 5us : __schedule <-schedule
+ <idle>-0 3d..3 5us : 0:120:R ==> [003] 2389: 94:R sleep
+
+
+Running this on an idle system, we see that it only took 5 microseconds
+to perform the task switch. Note, since the trace point in the schedule
+is before the actual "switch", we stop the tracing when the recorded task
+is about to schedule in. This may change if we add a new marker at the
+end of the scheduler.
+
+Notice that the recorded task is 'sleep' with the PID of 2389
and it has an rt_prio of 5. This priority is user-space priority
and not the internal kernel priority. The policy is 1 for
SCHED_FIFO and 2 for SCHED_RR.
-Doing the same with chrt -r 5 and ftrace_enabled set.
+Note, that the trace data shows the internal priority (99 - rtprio).
-# tracer: wakeup
+ <idle>-0 3d..3 5us : 0:120:R ==> [003] 2389: 94:R sleep
+
+The 0:120:R means idle was running with a nice priority of 0 (120 - 20)
+and in the running state 'R'. The sleep task was scheduled in with
+2389: 94:R. That is the priority is the kernel rtprio (99 - 5 = 94)
+and it too is in the running state.
+
+Doing the same with chrt -r 5 and function-trace set.
+
+ echo 1 > options/function-trace
+
+# tracer: wakeup_rt
#
-wakeup latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 50 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sleep-4068 (uid:0 nice:0 policy:2 rt_prio:5)
- -----------------
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
-ksoftirq-7 1d.H3 0us : try_to_wake_up (wake_up_process)
-ksoftirq-7 1d.H4 1us : sub_preempt_count (marker_probe_cb)
-ksoftirq-7 1d.H3 2us : check_preempt_wakeup (try_to_wake_up)
-ksoftirq-7 1d.H3 3us : update_curr (check_preempt_wakeup)
-ksoftirq-7 1d.H3 4us : calc_delta_mine (update_curr)
-ksoftirq-7 1d.H3 5us : __resched_task (check_preempt_wakeup)
-ksoftirq-7 1d.H3 6us : task_wake_up_rt (try_to_wake_up)
-ksoftirq-7 1d.H3 7us : _spin_unlock_irqrestore (try_to_wake_up)
-[...]
-ksoftirq-7 1d.H2 17us : irq_exit (smp_apic_timer_interrupt)
-ksoftirq-7 1d.H2 18us : sub_preempt_count (irq_exit)
-ksoftirq-7 1d.s3 19us : sub_preempt_count (irq_exit)
-ksoftirq-7 1..s2 20us : rcu_process_callbacks (__do_softirq)
-[...]
-ksoftirq-7 1..s2 26us : __rcu_process_callbacks (rcu_process_callbacks)
-ksoftirq-7 1d.s2 27us : _local_bh_enable (__do_softirq)
-ksoftirq-7 1d.s2 28us : sub_preempt_count (_local_bh_enable)
-ksoftirq-7 1.N.3 29us : sub_preempt_count (ksoftirqd)
-ksoftirq-7 1.N.2 30us : _cond_resched (ksoftirqd)
-ksoftirq-7 1.N.2 31us : __cond_resched (_cond_resched)
-ksoftirq-7 1.N.2 32us : add_preempt_count (__cond_resched)
-ksoftirq-7 1.N.2 33us : schedule (__cond_resched)
-ksoftirq-7 1.N.2 33us : add_preempt_count (schedule)
-ksoftirq-7 1.N.3 34us : hrtick_clear (schedule)
-ksoftirq-7 1dN.3 35us : _spin_lock (schedule)
-ksoftirq-7 1dN.3 36us : add_preempt_count (_spin_lock)
-ksoftirq-7 1d..4 37us : put_prev_task_fair (schedule)
-ksoftirq-7 1d..4 38us : update_curr (put_prev_task_fair)
-[...]
-ksoftirq-7 1d..5 47us : _spin_trylock (tracing_record_cmdline)
-ksoftirq-7 1d..5 48us : add_preempt_count (_spin_trylock)
-ksoftirq-7 1d..6 49us : _spin_unlock (tracing_record_cmdline)
-ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock)
-ksoftirq-7 1d..4 50us : schedule (__cond_resched)
-
-The interrupt went off while running ksoftirqd. This task runs
-at SCHED_OTHER. Why did not we see the 'N' set early? This may
-be a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K
-stacks configured, the interrupt and softirq run with their own
-stack. Some information is held on the top of the task's stack
-(need_resched and preempt_count are both stored there). The
-setting of the NEED_RESCHED bit is done directly to the task's
-stack, but the reading of the NEED_RESCHED is done by looking at
-the current stack, which in this case is the stack for the hard
-interrupt. This hides the fact that NEED_RESCHED has been set.
-We do not see the 'N' until we switch back to the task's
-assigned stack.
+# wakeup_rt latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 29 us, #85/85, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: sleep-2448 (uid:0 nice:0 policy:1 rt_prio:5)
+# -----------------
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ <idle>-0 3d.h4 1us+: 0:120:R + [003] 2448: 94:R sleep
+ <idle>-0 3d.h4 2us : ttwu_do_activate.constprop.87 <-try_to_wake_up
+ <idle>-0 3d.h3 3us : check_preempt_curr <-ttwu_do_wakeup
+ <idle>-0 3d.h3 3us : resched_task <-check_preempt_curr
+ <idle>-0 3dNh3 4us : task_woken_rt <-ttwu_do_wakeup
+ <idle>-0 3dNh3 4us : _raw_spin_unlock <-try_to_wake_up
+ <idle>-0 3dNh3 4us : sub_preempt_count <-_raw_spin_unlock
+ <idle>-0 3dNh2 5us : ttwu_stat <-try_to_wake_up
+ <idle>-0 3dNh2 5us : _raw_spin_unlock_irqrestore <-try_to_wake_up
+ <idle>-0 3dNh2 6us : sub_preempt_count <-_raw_spin_unlock_irqrestore
+ <idle>-0 3dNh1 6us : _raw_spin_lock <-__run_hrtimer
+ <idle>-0 3dNh1 6us : add_preempt_count <-_raw_spin_lock
+ <idle>-0 3dNh2 7us : _raw_spin_unlock <-hrtimer_interrupt
+ <idle>-0 3dNh2 7us : sub_preempt_count <-_raw_spin_unlock
+ <idle>-0 3dNh1 7us : tick_program_event <-hrtimer_interrupt
+ <idle>-0 3dNh1 7us : clockevents_program_event <-tick_program_event
+ <idle>-0 3dNh1 8us : ktime_get <-clockevents_program_event
+ <idle>-0 3dNh1 8us : lapic_next_event <-clockevents_program_event
+ <idle>-0 3dNh1 8us : irq_exit <-smp_apic_timer_interrupt
+ <idle>-0 3dNh1 9us : sub_preempt_count <-irq_exit
+ <idle>-0 3dN.2 9us : idle_cpu <-irq_exit
+ <idle>-0 3dN.2 9us : rcu_irq_exit <-irq_exit
+ <idle>-0 3dN.2 10us : rcu_eqs_enter_common.isra.45 <-rcu_irq_exit
+ <idle>-0 3dN.2 10us : sub_preempt_count <-irq_exit
+ <idle>-0 3.N.1 11us : rcu_idle_exit <-cpu_idle
+ <idle>-0 3dN.1 11us : rcu_eqs_exit_common.isra.43 <-rcu_idle_exit
+ <idle>-0 3.N.1 11us : tick_nohz_idle_exit <-cpu_idle
+ <idle>-0 3dN.1 12us : menu_hrtimer_cancel <-tick_nohz_idle_exit
+ <idle>-0 3dN.1 12us : ktime_get <-tick_nohz_idle_exit
+ <idle>-0 3dN.1 12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
+ <idle>-0 3dN.1 13us : update_cpu_load_nohz <-tick_nohz_idle_exit
+ <idle>-0 3dN.1 13us : _raw_spin_lock <-update_cpu_load_nohz
+ <idle>-0 3dN.1 13us : add_preempt_count <-_raw_spin_lock
+ <idle>-0 3dN.2 13us : __update_cpu_load <-update_cpu_load_nohz
+ <idle>-0 3dN.2 14us : sched_avg_update <-__update_cpu_load
+ <idle>-0 3dN.2 14us : _raw_spin_unlock <-update_cpu_load_nohz
+ <idle>-0 3dN.2 14us : sub_preempt_count <-_raw_spin_unlock
+ <idle>-0 3dN.1 15us : calc_load_exit_idle <-tick_nohz_idle_exit
+ <idle>-0 3dN.1 15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
+ <idle>-0 3dN.1 15us : hrtimer_cancel <-tick_nohz_idle_exit
+ <idle>-0 3dN.1 15us : hrtimer_try_to_cancel <-hrtimer_cancel
+ <idle>-0 3dN.1 16us : lock_hrtimer_base.isra.18 <-hrtimer_try_to_cancel
+ <idle>-0 3dN.1 16us : _raw_spin_lock_irqsave <-lock_hrtimer_base.isra.18
+ <idle>-0 3dN.1 16us : add_preempt_count <-_raw_spin_lock_irqsave
+ <idle>-0 3dN.2 17us : __remove_hrtimer <-remove_hrtimer.part.16
+ <idle>-0 3dN.2 17us : hrtimer_force_reprogram <-__remove_hrtimer
+ <idle>-0 3dN.2 17us : tick_program_event <-hrtimer_force_reprogram
+ <idle>-0 3dN.2 18us : clockevents_program_event <-tick_program_event
+ <idle>-0 3dN.2 18us : ktime_get <-clockevents_program_event
+ <idle>-0 3dN.2 18us : lapic_next_event <-clockevents_program_event
+ <idle>-0 3dN.2 19us : _raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel
+ <idle>-0 3dN.2 19us : sub_preempt_count <-_raw_spin_unlock_irqrestore
+ <idle>-0 3dN.1 19us : hrtimer_forward <-tick_nohz_idle_exit
+ <idle>-0 3dN.1 20us : ktime_add_safe <-hrtimer_forward
+ <idle>-0 3dN.1 20us : ktime_add_safe <-hrtimer_forward
+ <idle>-0 3dN.1 20us : hrtimer_start_range_ns <-hrtimer_start_expires.constprop.11
+ <idle>-0 3dN.1 20us : __hrtimer_start_range_ns <-hrtimer_start_range_ns
+ <idle>-0 3dN.1 21us : lock_hrtimer_base.isra.18 <-__hrtimer_start_range_ns
+ <idle>-0 3dN.1 21us : _raw_spin_lock_irqsave <-lock_hrtimer_base.isra.18
+ <idle>-0 3dN.1 21us : add_preempt_count <-_raw_spin_lock_irqsave
+ <idle>-0 3dN.2 22us : ktime_add_safe <-__hrtimer_start_range_ns
+ <idle>-0 3dN.2 22us : enqueue_hrtimer <-__hrtimer_start_range_ns
+ <idle>-0 3dN.2 22us : tick_program_event <-__hrtimer_start_range_ns
+ <idle>-0 3dN.2 23us : clockevents_program_event <-tick_program_event
+ <idle>-0 3dN.2 23us : ktime_get <-clockevents_program_event
+ <idle>-0 3dN.2 23us : lapic_next_event <-clockevents_program_event
+ <idle>-0 3dN.2 24us : _raw_spin_unlock_irqrestore <-__hrtimer_start_range_ns
+ <idle>-0 3dN.2 24us : sub_preempt_count <-_raw_spin_unlock_irqrestore
+ <idle>-0 3dN.1 24us : account_idle_ticks <-tick_nohz_idle_exit
+ <idle>-0 3dN.1 24us : account_idle_time <-account_idle_ticks
+ <idle>-0 3.N.1 25us : sub_preempt_count <-cpu_idle
+ <idle>-0 3.N.. 25us : schedule <-cpu_idle
+ <idle>-0 3.N.. 25us : __schedule <-preempt_schedule
+ <idle>-0 3.N.. 26us : add_preempt_count <-__schedule
+ <idle>-0 3.N.1 26us : rcu_note_context_switch <-__schedule
+ <idle>-0 3.N.1 26us : rcu_sched_qs <-rcu_note_context_switch
+ <idle>-0 3dN.1 27us : rcu_preempt_qs <-rcu_note_context_switch
+ <idle>-0 3.N.1 27us : _raw_spin_lock_irq <-__schedule
+ <idle>-0 3dN.1 27us : add_preempt_count <-_raw_spin_lock_irq
+ <idle>-0 3dN.2 28us : put_prev_task_idle <-__schedule
+ <idle>-0 3dN.2 28us : pick_next_task_stop <-pick_next_task
+ <idle>-0 3dN.2 28us : pick_next_task_rt <-pick_next_task
+ <idle>-0 3dN.2 29us : dequeue_pushable_task <-pick_next_task_rt
+ <idle>-0 3d..3 29us : __schedule <-preempt_schedule
+ <idle>-0 3d..3 30us : 0:120:R ==> [003] 2448: 94:R sleep
+
+This isn't that big of a trace, even with function tracing enabled,
+so I included the entire trace.
+
+The interrupt went off while when the system was idle. Somewhere
+before task_woken_rt() was called, the NEED_RESCHED flag was set,
+this is indicated by the first occurrence of the 'N' flag.
+
+Latency tracing and events
+--------------------------
+As function tracing can induce a much larger latency, but without
+seeing what happens within the latency it is hard to know what
+caused it. There is a middle ground, and that is with enabling
+events.
+
+ # echo 0 > options/function-trace
+ # echo wakeup_rt > current_tracer
+ # echo 1 > events/enable
+ # echo 1 > tracing_on
+ # echo 0 > tracing_max_latency
+ # chrt -f 5 sleep 1
+ # echo 0 > tracing_on
+ # cat trace
+# tracer: wakeup_rt
+#
+# wakeup_rt latency trace v1.1.5 on 3.8.0-test+
+# --------------------------------------------------------------------
+# latency: 6 us, #12/12, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
+# -----------------
+# | task: sleep-5882 (uid:0 nice:0 policy:1 rt_prio:5)
+# -----------------
+#
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| / delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ <idle>-0 2d.h4 0us : 0:120:R + [002] 5882: 94:R sleep
+ <idle>-0 2d.h4 0us : ttwu_do_activate.constprop.87 <-try_to_wake_up
+ <idle>-0 2d.h4 1us : sched_wakeup: comm=sleep pid=5882 prio=94 success=1 target_cpu=002
+ <idle>-0 2dNh2 1us : hrtimer_expire_exit: hrtimer=ffff88007796feb8
+ <idle>-0 2.N.2 2us : power_end: cpu_id=2
+ <idle>-0 2.N.2 3us : cpu_idle: state=4294967295 cpu_id=2
+ <idle>-0 2dN.3 4us : hrtimer_cancel: hrtimer=ffff88007d50d5e0
+ <idle>-0 2dN.3 4us : hrtimer_start: hrtimer=ffff88007d50d5e0 function=tick_sched_timer expires=34311211000000 softexpires=34311211000000
+ <idle>-0 2.N.2 5us : rcu_utilization: Start context switch
+ <idle>-0 2.N.2 5us : rcu_utilization: End context switch
+ <idle>-0 2d..3 6us : __schedule <-schedule
+ <idle>-0 2d..3 6us : 0:120:R ==> [002] 5882: 94:R sleep
+
function
--------
@@ -1039,6 +1653,7 @@ function
This tracer is the function tracer. Enabling the function tracer
can be done from the debug file system. Make sure the
ftrace_enabled is set; otherwise this tracer is a nop.
+See the "ftrace_enabled" section below.
# sysctl kernel.ftrace_enabled=1
# echo function > current_tracer
@@ -1048,23 +1663,23 @@ ftrace_enabled is set; otherwise this tracer is a nop.
# cat trace
# tracer: function
#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-4003 [00] 123.638713: finish_task_switch <-schedule
- bash-4003 [00] 123.638714: _spin_unlock_irq <-finish_task_switch
- bash-4003 [00] 123.638714: sub_preempt_count <-_spin_unlock_irq
- bash-4003 [00] 123.638715: hrtick_set <-schedule
- bash-4003 [00] 123.638715: _spin_lock_irqsave <-hrtick_set
- bash-4003 [00] 123.638716: add_preempt_count <-_spin_lock_irqsave
- bash-4003 [00] 123.638716: _spin_unlock_irqrestore <-hrtick_set
- bash-4003 [00] 123.638717: sub_preempt_count <-_spin_unlock_irqrestore
- bash-4003 [00] 123.638717: hrtick_clear <-hrtick_set
- bash-4003 [00] 123.638718: sub_preempt_count <-schedule
- bash-4003 [00] 123.638718: sub_preempt_count <-preempt_schedule
- bash-4003 [00] 123.638719: wait_for_completion <-__stop_machine_run
- bash-4003 [00] 123.638719: wait_for_common <-wait_for_completion
- bash-4003 [00] 123.638720: _spin_lock_irq <-wait_for_common
- bash-4003 [00] 123.638720: add_preempt_count <-_spin_lock_irq
+# entries-in-buffer/entries-written: 24799/24799 #P:4
+#
+# _-----=> irqs-off
+# / _----=> need-resched
+# | / _---=> hardirq/softirq
+# || / _--=> preempt-depth
+# ||| / delay
+# TASK-PID CPU# |||| TIMESTAMP FUNCTION
+# | | | |||| | |
+ bash-1994 [002] .... 3082.063030: mutex_unlock <-rb_simple_write
+ bash-1994 [002] .... 3082.063031: __mutex_unlock_slowpath <-mutex_unlock
+ bash-1994 [002] .... 3082.063031: __fsnotify_parent <-fsnotify_modify
+ bash-1994 [002] .... 3082.063032: fsnotify <-fsnotify_modify
+ bash-1994 [002] .... 3082.063032: __srcu_read_lock <-fsnotify
+ bash-1994 [002] .... 3082.063032: add_preempt_count <-__srcu_read_lock
+ bash-1994 [002] ...1 3082.063032: sub_preempt_count <-__srcu_read_lock
+ bash-1994 [002] .... 3082.063033: __srcu_read_unlock <-fsnotify
[...]
@@ -1214,79 +1829,19 @@ int main (int argc, char **argv)
return 0;
}
+Or this simple script!
-hw-branch-tracer (x86 only)
----------------------------
-
-This tracer uses the x86 last branch tracing hardware feature to
-collect a branch trace on all cpus with relatively low overhead.
-
-The tracer uses a fixed-size circular buffer per cpu and only
-traces ring 0 branches. The trace file dumps that buffer in the
-following format:
-
-# tracer: hw-branch-tracer
-#
-# CPU# TO <- FROM
- 0 scheduler_tick+0xb5/0x1bf <- task_tick_idle+0x5/0x6
- 2 run_posix_cpu_timers+0x2b/0x72a <- run_posix_cpu_timers+0x25/0x72a
- 0 scheduler_tick+0x139/0x1bf <- scheduler_tick+0xed/0x1bf
- 0 scheduler_tick+0x17c/0x1bf <- scheduler_tick+0x148/0x1bf
- 2 run_posix_cpu_timers+0x9e/0x72a <- run_posix_cpu_timers+0x5e/0x72a
- 0 scheduler_tick+0x1b6/0x1bf <- scheduler_tick+0x1aa/0x1bf
-
-
-The tracer may be used to dump the trace for the oops'ing cpu on
-a kernel oops into the system log. To enable this,
-ftrace_dump_on_oops must be set. To set ftrace_dump_on_oops, one
-can either use the sysctl function or set it via the proc system
-interface.
-
- sysctl kernel.ftrace_dump_on_oops=n
-
-or
-
- echo n > /proc/sys/kernel/ftrace_dump_on_oops
-
-If n = 1, ftrace will dump buffers of all CPUs, if n = 2 ftrace will
-only dump the buffer of the CPU that triggered the oops.
-
-Here's an example of such a dump after a null pointer
-dereference in a kernel module:
-
-[57848.105921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
-[57848.106019] IP: [<ffffffffa0000006>] open+0x6/0x14 [oops]
-[57848.106019] PGD 2354e9067 PUD 2375e7067 PMD 0
-[57848.106019] Oops: 0002 [#1] SMP
-[57848.106019] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:20:05.0/local_cpus
-[57848.106019] Dumping ftrace buffer:
-[57848.106019] ---------------------------------
-[...]
-[57848.106019] 0 chrdev_open+0xe6/0x165 <- cdev_put+0x23/0x24
-[57848.106019] 0 chrdev_open+0x117/0x165 <- chrdev_open+0xfa/0x165
-[57848.106019] 0 chrdev_open+0x120/0x165 <- chrdev_open+0x11c/0x165
-[57848.106019] 0 chrdev_open+0x134/0x165 <- chrdev_open+0x12b/0x165
-[57848.106019] 0 open+0x0/0x14 [oops] <- chrdev_open+0x144/0x165
-[57848.106019] 0 page_fault+0x0/0x30 <- open+0x6/0x14 [oops]
-[57848.106019] 0 error_entry+0x0/0x5b <- page_fault+0x4/0x30
-[57848.106019] 0 error_kernelspace+0x0/0x31 <- error_entry+0x59/0x5b
-[57848.106019] 0 error_sti+0x0/0x1 <- error_kernelspace+0x2d/0x31
-[57848.106019] 0 page_fault+0x9/0x30 <- error_sti+0x0/0x1
-[57848.106019] 0 do_page_fault+0x0/0x881 <- page_fault+0x1a/0x30
-[...]
-[57848.106019] 0 do_page_fault+0x66b/0x881 <- is_prefetch+0x1ee/0x1f2
-[57848.106019] 0 do_page_fault+0x6e0/0x881 <- do_page_fault+0x67a/0x881
-[57848.106019] 0 oops_begin+0x0/0x96 <- do_page_fault+0x6e0/0x881
-[57848.106019] 0 trace_hw_branch_oops+0x0/0x2d <- oops_begin+0x9/0x96
-[...]
-[57848.106019] 0 ds_suspend_bts+0x2a/0xe3 <- ds_suspend_bts+0x1a/0xe3
-[57848.106019] ---------------------------------
-[57848.106019] CPU 0
-[57848.106019] Modules linked in: oops
-[57848.106019] Pid: 5542, comm: cat Tainted: G W 2.6.28 #23
-[57848.106019] RIP: 0010:[<ffffffffa0000006>] [<ffffffffa0000006>] open+0x6/0x14 [oops]
-[57848.106019] RSP: 0018:ffff880235457d48 EFLAGS: 00010246
-[...]
+------
+#!/bin/bash
+
+debugfs=`sed -ne 's/^debugfs \(.*\) debugfs.*/\1/p' /proc/mounts`
+echo nop > $debugfs/tracing/current_tracer
+echo 0 > $debugfs/tracing/tracing_on
+echo $$ > $debugfs/tracing/set_ftrace_pid
+echo function > $debugfs/tracing/current_tracer
+echo 1 > $debugfs/tracing/tracing_on
+exec "$@"
+------
function graph tracer
@@ -1473,16 +2028,18 @@ starts of pointing to a simple return. (Enabling FTRACE will
include the -pg switch in the compiling of the kernel.)
At compile time every C file object is run through the
-recordmcount.pl script (located in the scripts directory). This
-script will process the C object using objdump to find all the
-locations in the .text section that call mcount. (Note, only the
-.text section is processed, since processing other sections like
-.init.text may cause races due to those sections being freed).
+recordmcount program (located in the scripts directory). This
+program will parse the ELF headers in the C object to find all
+the locations in the .text section that call mcount. (Note, only
+white listed .text sections are processed, since processing other
+sections like .init.text may cause races due to those sections
+being freed unexpectedly).
A new section called "__mcount_loc" is created that holds
references to all the mcount call sites in the .text section.
-This section is compiled back into the original object. The
-final linker will add all these references into a single table.
+The recordmcount program re-links this section back into the
+original object. The final linking stage of the kernel will add all these
+references into a single table.
On boot up, before SMP is initialized, the dynamic ftrace code
scans this table and updates all the locations into nops. It
@@ -1493,13 +2050,25 @@ unloaded, it also removes its functions from the ftrace function
list. This is automatic in the module unload code, and the
module author does not need to worry about it.
-When tracing is enabled, kstop_machine is called to prevent
-races with the CPUS executing code being modified (which can
-cause the CPU to do undesirable things), and the nops are
+When tracing is enabled, the process of modifying the function
+tracepoints is dependent on architecture. The old method is to use
+kstop_machine to prevent races with the CPUs executing code being
+modified (which can cause the CPU to do undesirable things, especially
+if the modified code crosses cache (or page) boundaries), and the nops are
patched back to calls. But this time, they do not call mcount
(which is just a function stub). They now call into the ftrace
infrastructure.
+The new method of modifying the function tracepoints is to place
+a breakpoint at the location to be modified, sync all CPUs, modify
+the rest of the instruction not covered by the breakpoint. Sync
+all CPUs again, and then remove the breakpoint with the finished
+version to the ftrace call site.
+
+Some archs do not even need to monkey around with the synchronization,
+and can just slap the new code on top of the old without any
+problems with other CPUs executing it at the same time.
+
One special side-effect to the recording of the functions being
traced is that we can now selectively choose which functions we
wish to trace and which ones we want the mcount calls to remain
@@ -1530,20 +2099,28 @@ mutex_lock
If I am only interested in sys_nanosleep and hrtimer_interrupt:
- # echo sys_nanosleep hrtimer_interrupt \
- > set_ftrace_filter
+ # echo sys_nanosleep hrtimer_interrupt > set_ftrace_filter
# echo function > current_tracer
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_on
# cat trace
-# tracer: ftrace
+# tracer: function
+#
+# entries-in-buffer/entries-written: 5/5 #P:4
#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- usleep-4134 [00] 1317.070017: hrtimer_interrupt <-smp_apic_timer_interrupt
- usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call
- <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt
+# _-----=> irqs-off
+# / _----=> need-resched
+# | / _---=> hardirq/softirq
+# || / _--=> preempt-depth
+# ||| / delay
+# TASK-PID CPU# |||| TIMESTAMP FUNCTION
+# | | | |||| | |
+ usleep-2665 [001] .... 4186.475355: sys_nanosleep <-system_call_fastpath
+ <idle>-0 [001] d.h1 4186.475409: hrtimer_interrupt <-smp_apic_timer_interrupt
+ usleep-2665 [001] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt
+ <idle>-0 [003] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt
+ <idle>-0 [002] d.h1 4186.475427: hrtimer_interrupt <-smp_apic_timer_interrupt
To see which functions are being traced, you can cat the file:
@@ -1571,20 +2148,25 @@ Note: It is better to use quotes to enclose the wild cards,
Produces:
-# tracer: ftrace
+# tracer: function
#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-4003 [00] 1480.611794: hrtimer_init <-copy_process
- bash-4003 [00] 1480.611941: hrtimer_start <-hrtick_set
- bash-4003 [00] 1480.611956: hrtimer_cancel <-hrtick_clear
- bash-4003 [00] 1480.611956: hrtimer_try_to_cancel <-hrtimer_cancel
- <idle>-0 [00] 1480.612019: hrtimer_get_next_event <-get_next_timer_interrupt
- <idle>-0 [00] 1480.612025: hrtimer_get_next_event <-get_next_timer_interrupt
- <idle>-0 [00] 1480.612032: hrtimer_get_next_event <-get_next_timer_interrupt
- <idle>-0 [00] 1480.612037: hrtimer_get_next_event <-get_next_timer_interrupt
- <idle>-0 [00] 1480.612382: hrtimer_get_next_event <-get_next_timer_interrupt
-
+# entries-in-buffer/entries-written: 897/897 #P:4
+#
+# _-----=> irqs-off
+# / _----=> need-resched
+# | / _---=> hardirq/softirq
+# || / _--=> preempt-depth
+# ||| / delay
+# TASK-PID CPU# |||| TIMESTAMP FUNCTION
+# | | | |||| | |
+ <idle>-0 [003] dN.1 4228.547803: hrtimer_cancel <-tick_nohz_idle_exit
+ <idle>-0 [003] dN.1 4228.547804: hrtimer_try_to_cancel <-hrtimer_cancel
+ <idle>-0 [003] dN.2 4228.547805: hrtimer_force_reprogram <-__remove_hrtimer
+ <idle>-0 [003] dN.1 4228.547805: hrtimer_forward <-tick_nohz_idle_exit
+ <idle>-0 [003] dN.1 4228.547805: hrtimer_start_range_ns <-hrtimer_start_expires.constprop.11
+ <idle>-0 [003] d..1 4228.547858: hrtimer_get_next_event <-get_next_timer_interrupt
+ <idle>-0 [003] d..1 4228.547859: hrtimer_start <-__tick_nohz_idle_enter
+ <idle>-0 [003] d..2 4228.547860: hrtimer_force_reprogram <-__rem
Notice that we lost the sys_nanosleep.
@@ -1651,19 +2233,29 @@ traced.
Produces:
-# tracer: ftrace
+# tracer: function
+#
+# entries-in-buffer/entries-written: 39608/39608 #P:4
#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-4043 [01] 115.281644: finish_task_switch <-schedule
- bash-4043 [01] 115.281645: hrtick_set <-schedule
- bash-4043 [01] 115.281645: hrtick_clear <-hrtick_set
- bash-4043 [01] 115.281646: wait_for_completion <-__stop_machine_run
- bash-4043 [01] 115.281647: wait_for_common <-wait_for_completion
- bash-4043 [01] 115.281647: kthread_stop <-stop_machine_run
- bash-4043 [01] 115.281648: init_waitqueue_head <-kthread_stop
- bash-4043 [01] 115.281648: wake_up_process <-kthread_stop
- bash-4043 [01] 115.281649: try_to_wake_up <-wake_up_process
+# _-----=> irqs-off
+# / _----=> need-resched
+# | / _---=> hardirq/softirq
+# || / _--=> preempt-depth
+# ||| / delay
+# TASK-PID CPU# |||| TIMESTAMP FUNCTION
+# | | | |||| | |
+ bash-1994 [000] .... 4342.324896: file_ra_state_init <-do_dentry_open
+ bash-1994 [000] .... 4342.324897: open_check_o_direct <-do_last
+ bash-1994 [000] .... 4342.324897: ima_file_check <-do_last
+ bash-1994 [000] .... 4342.324898: process_measurement <-ima_file_check
+ bash-1994 [000] .... 4342.324898: ima_get_action <-process_measurement
+ bash-1994 [000] .... 4342.324898: ima_match_policy <-ima_get_action
+ bash-1994 [000] .... 4342.324899: do_truncate <-do_last
+ bash-1994 [000] .... 4342.324899: should_remove_suid <-do_truncate
+ bash-1994 [000] .... 4342.324899: notify_change <-do_truncate
+ bash-1994 [000] .... 4342.324900: current_fs_time <-notify_change
+ bash-1994 [000] .... 4342.324900: current_kernel_time <-current_fs_time
+ bash-1994 [000] .... 4342.324900: timespec_trunc <-current_fs_time
We can see that there's no more lock or preempt tracing.
@@ -1729,6 +2321,28 @@ this special filter via:
echo > set_graph_function
+ftrace_enabled
+--------------
+
+Note, the proc sysctl ftrace_enable is a big on/off switch for the
+function tracer. By default it is enabled (when function tracing is
+enabled in the kernel). If it is disabled, all function tracing is
+disabled. This includes not only the function tracers for ftrace, but
+also for any other uses (perf, kprobes, stack tracing, profiling, etc).
+
+Please disable this with care.
+
+This can be disable (and enabled) with:
+
+ sysctl kernel.ftrace_enabled=0
+ sysctl kernel.ftrace_enabled=1
+
+ or
+
+ echo 0 > /proc/sys/kernel/ftrace_enabled
+ echo 1 > /proc/sys/kernel/ftrace_enabled
+
+
Filter commands
---------------
@@ -1763,12 +2377,58 @@ The following commands are supported:
echo '__schedule_bug:traceoff:5' > set_ftrace_filter
+ To always disable tracing when __schedule_bug is hit:
+
+ echo '__schedule_bug:traceoff' > set_ftrace_filter
+
These commands are cumulative whether or not they are appended
to set_ftrace_filter. To remove a command, prepend it by '!'
and drop the parameter:
+ echo '!__schedule_bug:traceoff:0' > set_ftrace_filter
+
+ The above removes the traceoff command for __schedule_bug
+ that have a counter. To remove commands without counters:
+
echo '!__schedule_bug:traceoff' > set_ftrace_filter
+- snapshot
+ Will cause a snapshot to be triggered when the function is hit.
+
+ echo 'native_flush_tlb_others:snapshot' > set_ftrace_filter
+
+ To only snapshot once:
+
+ echo 'native_flush_tlb_others:snapshot:1' > set_ftrace_filter
+
+ To remove the above commands:
+
+ echo '!native_flush_tlb_others:snapshot' > set_ftrace_filter
+ echo '!native_flush_tlb_others:snapshot:0' > set_ftrace_filter
+
+- enable_event/disable_event
+ These commands can enable or disable a trace event. Note, because
+ function tracing callbacks are very sensitive, when these commands
+ are registered, the trace point is activated, but disabled in
+ a "soft" mode. That is, the tracepoint will be called, but
+ just will not be traced. The event tracepoint stays in this mode
+ as long as there's a command that triggers it.
+
+ echo 'try_to_wake_up:enable_event:sched:sched_switch:2' > \
+ set_ftrace_filter
+
+ The format is:
+
+ <function>:enable_event:<system>:<event>[:count]
+ <function>:disable_event:<system>:<event>[:count]
+
+ To remove the events commands:
+
+
+ echo '!try_to_wake_up:enable_event:sched:sched_switch:0' > \
+ set_ftrace_filter
+ echo '!schedule:disable_event:sched:sched_switch' > \
+ set_ftrace_filter
trace_pipe
----------
@@ -1787,28 +2447,31 @@ different. The trace is live.
# cat trace
# tracer: function
#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
+# entries-in-buffer/entries-written: 0/0 #P:4
+#
+# _-----=> irqs-off
+# / _----=> need-resched
+# | / _---=> hardirq/softirq
+# || / _--=> preempt-depth
+# ||| / delay
+# TASK-PID CPU# |||| TIMESTAMP FUNCTION
+# | | | |||| | |
#
# cat /tmp/trace.out
- bash-4043 [00] 41.267106: finish_task_switch <-schedule
- bash-4043 [00] 41.267106: hrtick_set <-schedule
- bash-4043 [00] 41.267107: hrtick_clear <-hrtick_set
- bash-4043 [00] 41.267108: wait_for_completion <-__stop_machine_run
- bash-4043 [00] 41.267108: wait_for_common <-wait_for_completion
- bash-4043 [00] 41.267109: kthread_stop <-stop_machine_run
- bash-4043 [00] 41.267109: init_waitqueue_head <-kthread_stop
- bash-4043 [00] 41.267110: wake_up_process <-kthread_stop
- bash-4043 [00] 41.267110: try_to_wake_up <-wake_up_process
- bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up
+ bash-1994 [000] .... 5281.568961: mutex_unlock <-rb_simple_write
+ bash-1994 [000] .... 5281.568963: __mutex_unlock_slowpath <-mutex_unlock
+ bash-1994 [000] .... 5281.568963: __fsnotify_parent <-fsnotify_modify
+ bash-1994 [000] .... 5281.568964: fsnotify <-fsnotify_modify
+ bash-1994 [000] .... 5281.568964: __srcu_read_lock <-fsnotify
+ bash-1994 [000] .... 5281.568964: add_preempt_count <-__srcu_read_lock
+ bash-1994 [000] ...1 5281.568965: sub_preempt_count <-__srcu_read_lock
+ bash-1994 [000] .... 5281.568965: __srcu_read_unlock <-fsnotify
+ bash-1994 [000] .... 5281.568967: sys_dup2 <-system_call_fastpath
Note, reading the trace_pipe file will block until more input is
-added. By changing the tracer, trace_pipe will issue an EOF. We
-needed to set the function tracer _before_ we "cat" the
-trace_pipe file.
-
+added.
trace entries
-------------
@@ -1817,31 +2480,50 @@ Having too much or not enough data can be troublesome in
diagnosing an issue in the kernel. The file buffer_size_kb is
used to modify the size of the internal trace buffers. The
number listed is the number of entries that can be recorded per
-CPU. To know the full size, multiply the number of possible CPUS
+CPU. To know the full size, multiply the number of possible CPUs
with the number of entries.
# cat buffer_size_kb
1408 (units kilobytes)
-Note, to modify this, you must have tracing completely disabled.
-To do that, echo "nop" into the current_tracer. If the
-current_tracer is not set to "nop", an EINVAL error will be
-returned.
+Or simply read buffer_total_size_kb
+
+ # cat buffer_total_size_kb
+5632
+
+To modify the buffer, simple echo in a number (in 1024 byte segments).
- # echo nop > current_tracer
# echo 10000 > buffer_size_kb
# cat buffer_size_kb
10000 (units kilobytes)
-The number of pages which will be allocated is limited to a
-percentage of available memory. Allocating too much will produce
-an error.
+It will try to allocate as much as possible. If you allocate too
+much, it can cause Out-Of-Memory to trigger.
# echo 1000000000000 > buffer_size_kb
-bash: echo: write error: Cannot allocate memory
# cat buffer_size_kb
85
+The per_cpu buffers can be changed individually as well:
+
+ # echo 10000 > per_cpu/cpu0/buffer_size_kb
+ # echo 100 > per_cpu/cpu1/buffer_size_kb
+
+When the per_cpu buffers are not the same, the buffer_size_kb
+at the top level will just show an X
+
+ # cat buffer_size_kb
+X
+
+This is where the buffer_total_size_kb is useful:
+
+ # cat buffer_total_size_kb
+12916
+
+Writing to the top level buffer_size_kb will reset all the buffers
+to be the same again.
+
Snapshot
--------
CONFIG_TRACER_SNAPSHOT makes a generic snapshot feature
@@ -1873,7 +2555,7 @@ feature:
status\input | 0 | 1 | else |
--------------+------------+------------+------------+
- not allocated |(do nothing)| alloc+swap | EINVAL |
+ not allocated |(do nothing)| alloc+swap |(do nothing)|
--------------+------------+------------+------------+
allocated | free | swap | clear |
--------------+------------+------------+------------+
@@ -1925,7 +2607,188 @@ bash: echo: write error: Device or resource busy
# cat snapshot
cat: snapshot: Device or resource busy
+
+Instances
+---------
+In the debugfs tracing directory is a directory called "instances".
+This directory can have new directories created inside of it using
+mkdir, and removing directories with rmdir. The directory created
+with mkdir in this directory will already contain files and other
+directories after it is created.
+
+ # mkdir instances/foo
+ # ls instances/foo
+buffer_size_kb buffer_total_size_kb events free_buffer per_cpu
+set_event snapshot trace trace_clock trace_marker trace_options
+trace_pipe tracing_on
+
+As you can see, the new directory looks similar to the tracing directory
+itself. In fact, it is very similar, except that the buffer and
+events are agnostic from the main director, or from any other
+instances that are created.
+
+The files in the new directory work just like the files with the
+same name in the tracing directory except the buffer that is used
+is a separate and new buffer. The files affect that buffer but do not
+affect the main buffer with the exception of trace_options. Currently,
+the trace_options affect all instances and the top level buffer
+the same, but this may change in future releases. That is, options
+may become specific to the instance they reside in.
+
+Notice that none of the function tracer files are there, nor is
+current_tracer and available_tracers. This is because the buffers
+can currently only have events enabled for them.
+
+ # mkdir instances/foo
+ # mkdir instances/bar
+ # mkdir instances/zoot
+ # echo 100000 > buffer_size_kb
+ # echo 1000 > instances/foo/buffer_size_kb
+ # echo 5000 > instances/bar/per_cpu/cpu1/buffer_size_kb
+ # echo function > current_trace
+ # echo 1 > instances/foo/events/sched/sched_wakeup/enable
+ # echo 1 > instances/foo/events/sched/sched_wakeup_new/enable
+ # echo 1 > instances/foo/events/sched/sched_switch/enable
+ # echo 1 > instances/bar/events/irq/enable
+ # echo 1 > instances/zoot/events/syscalls/enable
+ # cat trace_pipe
+CPU:2 [LOST 11745 EVENTS]
+ bash-2044 [002] .... 10594.481032: _raw_spin_lock_irqsave <-get_page_from_freelist
+ bash-2044 [002] d... 10594.481032: add_preempt_count <-_raw_spin_lock_irqsave
+ bash-2044 [002] d..1 10594.481032: __rmqueue <-get_page_from_freelist
+ bash-2044 [002] d..1 10594.481033: _raw_spin_unlock <-get_page_from_freelist
+ bash-2044 [002] d..1 10594.481033: sub_preempt_count <-_raw_spin_unlock
+ bash-2044 [002] d... 10594.481033: get_pageblock_flags_group <-get_pageblock_migratetype
+ bash-2044 [002] d... 10594.481034: __mod_zone_page_state <-get_page_from_freelist
+ bash-2044 [002] d... 10594.481034: zone_statistics <-get_page_from_freelist
+ bash-2044 [002] d... 10594.481034: __inc_zone_state <-zone_statistics
+ bash-2044 [002] d... 10594.481034: __inc_zone_state <-zone_statistics
+ bash-2044 [002] .... 10594.481035: arch_dup_task_struct <-copy_process
+[...]
+
+ # cat instances/foo/trace_pipe
+ bash-1998 [000] d..4 136.676759: sched_wakeup: comm=kworker/0:1 pid=59 prio=120 success=1 target_cpu=000
+ bash-1998 [000] dN.4 136.676760: sched_wakeup: comm=bash pid=1998 prio=120 success=1 target_cpu=000
+ <idle>-0 [003] d.h3 136.676906: sched_wakeup: comm=rcu_preempt pid=9 prio=120 success=1 target_cpu=003
+ <idle>-0 [003] d..3 136.676909: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=9 next_prio=120
+ rcu_preempt-9 [003] d..3 136.676916: sched_switch: prev_comm=rcu_preempt prev_pid=9 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
+ bash-1998 [000] d..4 136.677014: sched_wakeup: comm=kworker/0:1 pid=59 prio=120 success=1 target_cpu=000
+ bash-1998 [000] dN.4 136.677016: sched_wakeup: comm=bash pid=1998 prio=120 success=1 target_cpu=000
+ bash-1998 [000] d..3 136.677018: sched_switch: prev_comm=bash prev_pid=1998 prev_prio=120 prev_state=R+ ==> next_comm=kworker/0:1 next_pid=59 next_prio=120
+ kworker/0:1-59 [000] d..4 136.677022: sched_wakeup: comm=sshd pid=1995 prio=120 success=1 target_cpu=001
+ kworker/0:1-59 [000] d..3 136.677025: sched_switch: prev_comm=kworker/0:1 prev_pid=59 prev_prio=120 prev_state=S ==> next_comm=bash next_pid=1998 next_prio=120
+[...]
+
+ # cat instances/bar/trace_pipe
+ migration/1-14 [001] d.h3 138.732674: softirq_raise: vec=3 [action=NET_RX]
+ <idle>-0 [001] dNh3 138.732725: softirq_raise: vec=3 [action=NET_RX]
+ bash-1998 [000] d.h1 138.733101: softirq_raise: vec=1 [action=TIMER]
+ bash-1998 [000] d.h1 138.733102: softirq_raise: vec=9 [action=RCU]
+ bash-1998 [000] ..s2 138.733105: softirq_entry: vec=1 [action=TIMER]
+ bash-1998 [000] ..s2 138.733106: softirq_exit: vec=1 [action=TIMER]
+ bash-1998 [000] ..s2 138.733106: softirq_entry: vec=9 [action=RCU]
+ bash-1998 [000] ..s2 138.733109: softirq_exit: vec=9 [action=RCU]
+ sshd-1995 [001] d.h1 138.733278: irq_handler_entry: irq=21 name=uhci_hcd:usb4
+ sshd-1995 [001] d.h1 138.733280: irq_handler_exit: irq=21 ret=unhandled
+ sshd-1995 [001] d.h1 138.733281: irq_handler_entry: irq=21 name=eth0
+ sshd-1995 [001] d.h1 138.733283: irq_handler_exit: irq=21 ret=handled
+[...]
+
+ # cat instances/zoot/trace
+# tracer: nop
+#
+# entries-in-buffer/entries-written: 18996/18996 #P:4
+#
+# _-----=> irqs-off
+# / _----=> need-resched
+# | / _---=> hardirq/softirq
+# || / _--=> preempt-depth
+# ||| / delay
+# TASK-PID CPU# |||| TIMESTAMP FUNCTION
+# | | | |||| | |
+ bash-1998 [000] d... 140.733501: sys_write -> 0x2
+ bash-1998 [000] d... 140.733504: sys_dup2(oldfd: a, newfd: 1)
+ bash-1998 [000] d... 140.733506: sys_dup2 -> 0x1
+ bash-1998 [000] d... 140.733508: sys_fcntl(fd: a, cmd: 1, arg: 0)
+ bash-1998 [000] d... 140.733509: sys_fcntl -> 0x1
+ bash-1998 [000] d... 140.733510: sys_close(fd: a)
+ bash-1998 [000] d... 140.733510: sys_close -> 0x0
+ bash-1998 [000] d... 140.733514: sys_rt_sigprocmask(how: 0, nset: 0, oset: 6e2768, sigsetsize: 8)
+ bash-1998 [000] d... 140.733515: sys_rt_sigprocmask -> 0x0
+ bash-1998 [000] d... 140.733516: sys_rt_sigaction(sig: 2, act: 7fff718846f0, oact: 7fff71884650, sigsetsize: 8)
+ bash-1998 [000] d... 140.733516: sys_rt_sigaction -> 0x0
+
+You can see that the trace of the top most trace buffer shows only
+the function tracing. The foo instance displays wakeups and task
+switches.
+
+To remove the instances, simply delete their directories:
+
+ # rmdir instances/foo
+ # rmdir instances/bar
+ # rmdir instances/zoot
+
+Note, if a process has a trace file open in one of the instance
+directories, the rmdir will fail with EBUSY.
+
+
+Stack trace
-----------
+Since the kernel has a fixed sized stack, it is important not to
+waste it in functions. A kernel developer must be conscience of
+what they allocate on the stack. If they add too much, the system
+can be in danger of a stack overflow, and corruption will occur,
+usually leading to a system panic.
+
+There are some tools that check this, usually with interrupts
+periodically checking usage. But if you can perform a check
+at every function call that will become very useful. As ftrace provides
+a function tracer, it makes it convenient to check the stack size
+at every function call. This is enabled via the stack tracer.
+
+CONFIG_STACK_TRACER enables the ftrace stack tracing functionality.
+To enable it, write a '1' into /proc/sys/kernel/stack_tracer_enabled.
+
+ # echo 1 > /proc/sys/kernel/stack_tracer_enabled
+
+You can also enable it from the kernel command line to trace
+the stack size of the kernel during boot up, by adding "stacktrace"
+to the kernel command line parameter.
+
+After running it for a few minutes, the output looks like:
+
+ # cat stack_max_size
+2928
+
+ # cat stack_trace
+ Depth Size Location (18 entries)
+ ----- ---- --------
+ 0) 2928 224 update_sd_lb_stats+0xbc/0x4ac
+ 1) 2704 160 find_busiest_group+0x31/0x1f1
+ 2) 2544 256 load_balance+0xd9/0x662
+ 3) 2288 80 idle_balance+0xbb/0x130
+ 4) 2208 128 __schedule+0x26e/0x5b9
+ 5) 2080 16 schedule+0x64/0x66
+ 6) 2064 128 schedule_timeout+0x34/0xe0
+ 7) 1936 112 wait_for_common+0x97/0xf1
+ 8) 1824 16 wait_for_completion+0x1d/0x1f
+ 9) 1808 128 flush_work+0xfe/0x119
+ 10) 1680 16 tty_flush_to_ldisc+0x1e/0x20
+ 11) 1664 48 input_available_p+0x1d/0x5c
+ 12) 1616 48 n_tty_poll+0x6d/0x134
+ 13) 1568 64 tty_poll+0x64/0x7f
+ 14) 1504 880 do_select+0x31e/0x511
+ 15) 624 400 core_sys_select+0x177/0x216
+ 16) 224 96 sys_select+0x91/0xb9
+ 17) 128 128 system_call_fastpath+0x16/0x1b
+
+Note, if -mfentry is being used by gcc, functions get traced before
+they set up the stack frame. This means that leaf level functions
+are not tested by the stack tracer when -mfentry is used.
+
+Currently, -mfentry is used by gcc 4.6.0 and above on x86 only.
+
+---------
More details can be found in the source code, in the
kernel/trace/*.c files.
diff --git a/Documentation/trace/uprobetracer.txt b/Documentation/trace/uprobetracer.txt
index 24ce6823a09e..d9c3e682312c 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -1,6 +1,8 @@
- Uprobe-tracer: Uprobe-based Event Tracing
- =========================================
- Documentation written by Srikar Dronamraju
+ Uprobe-tracer: Uprobe-based Event Tracing
+ =========================================
+
+ Documentation written by Srikar Dronamraju
+
Overview
--------
@@ -13,78 +15,94 @@ current_tracer. Instead of that, add probe points via
/sys/kernel/debug/tracing/events/uprobes/<EVENT>/enabled.
However unlike kprobe-event tracer, the uprobe event interface expects the
-user to calculate the offset of the probepoint in the object
+user to calculate the offset of the probepoint in the object.
Synopsis of uprobe_tracer
-------------------------
- p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a probe
+ p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a uprobe
+ r[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a return uprobe (uretprobe)
+ -:[GRP/]EVENT : Clear uprobe or uretprobe event
- GRP : Group name. If omitted, use "uprobes" for it.
- EVENT : Event name. If omitted, the event name is generated
- based on SYMBOL+offs.
- PATH : path to an executable or a library.
- SYMBOL[+offs] : Symbol+offset where the probe is inserted.
+ GRP : Group name. If omitted, "uprobes" is the default value.
+ EVENT : Event name. If omitted, the event name is generated based
+ on SYMBOL+offs.
+ PATH : Path to an executable or a library.
+ SYMBOL[+offs] : Symbol+offset where the probe is inserted.
- FETCHARGS : Arguments. Each probe can have up to 128 args.
- %REG : Fetch register REG
+ FETCHARGS : Arguments. Each probe can have up to 128 args.
+ %REG : Fetch register REG
Event Profiling
---------------
- You can check the total number of probe hits and probe miss-hits via
+You can check the total number of probe hits and probe miss-hits via
/sys/kernel/debug/tracing/uprobe_profile.
- The first column is event name, the second is the number of probe hits,
+The first column is event name, the second is the number of probe hits,
the third is the number of probe miss-hits.
Usage examples
--------------
-To add a probe as a new event, write a new definition to uprobe_events
-as below.
+ * Add a probe as a new uprobe event, write a new definition to uprobe_events
+as below: (sets a uprobe at an offset of 0x4245c0 in the executable /bin/bash)
+
+ echo 'p: /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
+
+ * Add a probe as a new uretprobe event:
+
+ echo 'r: /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
+
+ * Unset registered event:
- echo 'p: /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
+ echo '-:bash_0x4245c0' >> /sys/kernel/debug/tracing/uprobe_events
- This sets a uprobe at an offset of 0x4245c0 in the executable /bin/bash
+ * Print out the events that are registered:
- echo > /sys/kernel/debug/tracing/uprobe_events
+ cat /sys/kernel/debug/tracing/uprobe_events
- This clears all probe points.
+ * Clear all events:
-The following example shows how to dump the instruction pointer and %ax
-a register at the probed text address. Here we are trying to probe
-function zfree in /bin/zsh
+ echo > /sys/kernel/debug/tracing/uprobe_events
+
+Following example shows how to dump the instruction pointer and %ax register
+at the probed text address. Probe zfree function in /bin/zsh:
# cd /sys/kernel/debug/tracing/
- # cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
+ # cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
00400000-0048a000 r-xp 00000000 08:03 130904 /bin/zsh
# objdump -T /bin/zsh | grep -w zfree
0000000000446420 g DF .text 0000000000000012 Base zfree
-0x46420 is the offset of zfree in object /bin/zsh that is loaded at
-0x00400000. Hence the command to probe would be :
+ 0x46420 is the offset of zfree in object /bin/zsh that is loaded at
+ 0x00400000. Hence the command to uprobe would be:
+
+ # echo 'p:zfree_entry /bin/zsh:0x46420 %ip %ax' > uprobe_events
+
+ And the same for the uretprobe would be:
- # echo 'p /bin/zsh:0x46420 %ip %ax' > uprobe_events
+ # echo 'r:zfree_exit /bin/zsh:0x46420 %ip %ax' >> uprobe_events
-Please note: User has to explicitly calculate the offset of the probepoint
+Please note: User has to explicitly calculate the offset of the probe-point
in the object. We can see the events that are registered by looking at the
uprobe_events file.
# cat uprobe_events
- p:uprobes/p_zsh_0x46420 /bin/zsh:0x00046420 arg1=%ip arg2=%ax
+ p:uprobes/zfree_entry /bin/zsh:0x00046420 arg1=%ip arg2=%ax
+ r:uprobes/zfree_exit /bin/zsh:0x00046420 arg1=%ip arg2=%ax
-The format of events can be seen by viewing the file events/uprobes/p_zsh_0x46420/format
+Format of events can be seen by viewing the file events/uprobes/zfree_entry/format
- # cat events/uprobes/p_zsh_0x46420/format
- name: p_zsh_0x46420
+ # cat events/uprobes/zfree_entry/format
+ name: zfree_entry
ID: 922
format:
- field:unsigned short common_type; offset:0; size:2; signed:0;
- field:unsigned char common_flags; offset:2; size:1; signed:0;
- field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
- field:int common_pid; offset:4; size:4; signed:1;
- field:int common_padding; offset:8; size:4; signed:1;
+ field:unsigned short common_type; offset:0; size:2; signed:0;
+ field:unsigned char common_flags; offset:2; size:1; signed:0;
+ field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
+ field:int common_pid; offset:4; size:4; signed:1;
+ field:int common_padding; offset:8; size:4; signed:1;
- field:unsigned long __probe_ip; offset:12; size:4; signed:0;
- field:u32 arg1; offset:16; size:4; signed:0;
- field:u32 arg2; offset:20; size:4; signed:0;
+ field:unsigned long __probe_ip; offset:12; size:4; signed:0;
+ field:u32 arg1; offset:16; size:4; signed:0;
+ field:u32 arg2; offset:20; size:4; signed:0;
print fmt: "(%lx) arg1=%lx arg2=%lx", REC->__probe_ip, REC->arg1, REC->arg2
@@ -94,6 +112,7 @@ events, you need to enable it by:
# echo 1 > events/uprobes/enable
Lets disable the event after sleeping for some time.
+
# sleep 20
# echo 0 > events/uprobes/enable
@@ -104,10 +123,11 @@ And you can see the traced information via /sys/kernel/debug/tracing/trace.
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
- zsh-24842 [006] 258544.995456: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
- zsh-24842 [007] 258545.000270: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
- zsh-24842 [002] 258545.043929: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
- zsh-24842 [004] 258547.046129: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
-
-Each line shows us probes were triggered for a pid 24842 with ip being
-0x446421 and contents of ax register being 79.
+ zsh-24842 [006] 258544.995456: zfree_entry: (0x446420) arg1=446420 arg2=79
+ zsh-24842 [007] 258545.000270: zfree_exit: (0x446540 <- 0x446420) arg1=446540 arg2=0
+ zsh-24842 [002] 258545.043929: zfree_entry: (0x446420) arg1=446420 arg2=79
+ zsh-24842 [004] 258547.046129: zfree_exit: (0x446540 <- 0x446420) arg1=446540 arg2=0
+
+Output shows us uprobe was triggered for a pid 24842 with ip being 0x446420
+and contents of ax register being 79. And uretprobe was triggered with ip at
+0x446540 with counterpart function entry at 0x446420.
diff --git a/Documentation/usb/power-management.txt b/Documentation/usb/power-management.txt
index 4204eb01fd38..1392b61d6ebe 100644
--- a/Documentation/usb/power-management.txt
+++ b/Documentation/usb/power-management.txt
@@ -33,6 +33,10 @@ built with CONFIG_USB_SUSPEND enabled (which depends on
CONFIG_PM_RUNTIME). System PM support is present only if the kernel
was built with CONFIG_SUSPEND or CONFIG_HIBERNATION enabled.
+(Starting with the 3.10 kernel release, dynamic PM support for USB is
+present whenever the kernel was built with CONFIG_PM_RUNTIME enabled.
+The CONFIG_USB_SUSPEND option has been eliminated.)
+
What is Remote Wakeup?
----------------------
@@ -206,10 +210,8 @@ initialized to 5. (The idle-delay values for already existing devices
will not be affected.)
Setting the initial default idle-delay to -1 will prevent any
-autosuspend of any USB device. This is a simple alternative to
-disabling CONFIG_USB_SUSPEND and rebuilding the kernel, and it has the
-added benefit of allowing you to enable autosuspend for selected
-devices.
+autosuspend of any USB device. This has the benefit of allowing you
+then to enable autosuspend for selected devices.
Warnings
diff --git a/Documentation/vm/overcommit-accounting b/Documentation/vm/overcommit-accounting
index 706d7ed9d8d2..8eaa2fc4b8fa 100644
--- a/Documentation/vm/overcommit-accounting
+++ b/Documentation/vm/overcommit-accounting
@@ -8,7 +8,9 @@ The Linux kernel supports the following overcommit handling modes
default.
1 - Always overcommit. Appropriate for some scientific
- applications.
+ applications. Classic example is code using sparse arrays
+ and just relying on the virtual memory consisting almost
+ entirely of zero pages.
2 - Don't overcommit. The total address space commit
for the system is not permitted to exceed swap + a
@@ -18,6 +20,10 @@ The Linux kernel supports the following overcommit handling modes
pages but will receive errors on memory allocation as
appropriate.
+ Useful for applications that want to guarantee their
+ memory allocations will be available in the future
+ without having to initialize every page.
+
The overcommit policy is set via the sysctl `vm.overcommit_memory'.
The overcommit percentage is set via `vm.overcommit_ratio'.
diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt
index 086638f6c82d..a0438f3957ca 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -1,6 +1,6 @@
The Linux WatchDog Timer Driver Core kernel API.
===============================================
-Last reviewed: 22-May-2012
+Last reviewed: 12-Feb-2013
Wim Van Sebroeck <wim@iguana.be>
@@ -212,3 +212,15 @@ driver specific data to and a pointer to the data itself.
The watchdog_get_drvdata function allows you to retrieve driver specific data.
The argument of this function is the watchdog device where you want to retrieve
data from. The function returns the pointer to the driver specific data.
+
+To initialize the timeout field, the following function can be used:
+
+extern int watchdog_init_timeout(struct watchdog_device *wdd,
+ unsigned int timeout_parm, struct device *dev);
+
+The watchdog_init_timeout function allows you to initialize the timeout field
+using the module timeout parameter or by retrieving the timeout-sec property from
+the device tree (if the module timeout parameter is invalid). Best practice is
+to set the default timeout value as timeout value in the watchdog_device and
+then use this function to set the user "preferred" timeout value.
+This routine returns zero on success and a negative errno code for failure.