summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorViresh Kumar <viresh.kumar@linaro.org>2017-08-14 14:14:00 +0530
committerViresh Kumar <viresh.kumar@linaro.org>2017-08-14 14:14:00 +0530
commitbaebab3ee5970d82f5e3225dfcffa3d8038d768f (patch)
tree9614f44c81fc20f94374c34575fcf946c335855f
init
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
-rw-r--r--greybus/corbet_edited.txt494
-rw-r--r--greybus/greybus.pngbin0 -> 333454 bytes
-rw-r--r--greybus/greybus_subsystem.html517
-rw-r--r--greybus/greybus_subsystem.txt580
-rw-r--r--greybus/modules.pngbin0 -> 202619 bytes
-rw-r--r--greybus/my_raw.txt559
-rw-r--r--opp/compare.txt273
-rw-r--r--opp/corbet_edited.txt260
-rw-r--r--opp/intro.txt16
-rw-r--r--opp/use_opp_to_do_dvfs.html279
-rw-r--r--opp/use_opp_to_do_dvfs.txt296
-rw-r--r--workqueue/delayed-wq.pngbin0 -> 62741 bytes
-rw-r--r--workqueue/power_efficient_workqueue.html191
-rw-r--r--workqueue/power_efficient_workqueue.txt204
14 files changed, 3669 insertions, 0 deletions
diff --git a/greybus/corbet_edited.txt b/greybus/corbet_edited.txt
new file mode 100644
index 0000000..5a58875
--- /dev/null
+++ b/greybus/corbet_edited.txt
@@ -0,0 +1,494 @@
+The Greybus Subsystem
+=====================
+
+The Linux kernel gained a new subsystem during the 4.9 development cycle. That
+subsystem is Greybus and this article will briefly take you through its
+internals.
+
+Greybus was initially designed for Google’s Project ARA smartphone (which is
+discontinued now), but the first (and only) product released with it is
+Motorola’s Moto Mods. There are also discussions going on to evaluate the
+feasibility of using the protocols provided by Greybus in applications like the
+Internet of things, and other parts of the kernel that need to communicate in a
+platform-independent way.
+
+Initially, Greg Kroah-Hartman tried to merge Greybus core in the kernel's
+drivers directory but, after some objections (people wanted to do more detailed
+reviews before merging it) everyone agreed to merge it into the staging tree
+instead. Almost 2400 patches, developed over 2.5 years, were merged; these
+contributions came from over 50 developers representing at least five
+organizations (Google, Linaro, BayLibre, LeafLabs, and MMSolutions). There were
+a lot more developers and companies involved in the development of the other
+parts of the ARA software and hardware. Greybus developers also showed in the
+list of most active developers (the top four by changesets) for the 4.9
+release.
+
+Greg made sure that Greybus was merged with all its history preserved, saying:
+
+ "Because this was 2 1/2 years of work, with many many developers
+ contributing, I didn't want to flatten all of their effort into a few
+ small patches, as that wouldn't be very fair. So I've built a git tree
+ with all of the changes going back to the first commit, and merged it
+ into the kernel tree, just like btrfs was merged into the kernel."
+
+Jonathan Corbet wrote an article earlier on Greybus; readers may want to look
+at that to catchup on some history.
+
+UniPro and the internals of the Greybus subsystem
+-------------------------------------------------
+
+The Project ARA smartphone was designed to be customizable. The user could
+select a subset from a wide range of modules, providing interesting
+capabilities (like cameras, speakers, batteries, displays, sensors, etc), and
+attach them to the frame of the phone. The modules could communicate with the
+main processors or other modules directly over the UniPro bus. The
+specification of this bus is managed by the MIPI alliance. UniPro follows the
+architecture of the classical OSI network model, except that it has no
+application layer defined. And that’s where Greybus fits in.
+
+UniPro communication happens over bidirectional connections between entities,
+like the modules on the ARA smartphone; it doesn’t need to go through the
+processors. Each UniPro device has virtual ports within it, which can be seen
+as sub-addresses within the device. They are a lot like sockets and are called
+"connection ports" (or CPorts). There is a switch on the bus that sets up the
+actual routes. Messages can pass at a rate of around 10Gb/s; the bus also has
+message prioritization, error handling, and notification of delivery problems,
+thought Unipro doesn’t support streams or multicast delivery.
+
+As the Greybus specification was initially written for the Project ARA
+smartphone, it is greatly inspired by ARA’s design, where modules can be
+inserted into or removed from the phone’s frame on the fly. A lot of effort has
+been put in to make the specification as generic as possible, in order to make
+it fit for other use cases as well. You will also notice a lot of similarities
+with the USB Bus framework in Linux kernel, as it was taken as a reference
+during the development of Greybus.
+
+The Greybus specification provides device discovery and description at runtime,
+network routing and housekeeping, and class and bridged PHY protocols, which
+devices use to talk to each other and to the processors. The following figure
+gives a glimpse of how various parts of the kernel interact with the Greybus
+Subsystem.
+
+image::./greybus.png[title="The Greybus Subsystem",height=500,width=800,align="center"]
+
+The Greybus Core implements the SVC protocol (described later), which is used
+by the application processor (AP — the CPUs running Linux) to communicate to
+the supervisory controller (SVC). The SVC represents an entity within the
+Greybus network which configures and controls the Greybus (UniPro) network,
+mostly based on the instructions from the AP. All module insertion and removal
+events are first reported to the SVC, which in turn informs the AP about them
+using the SVC protocol. The AP is responsible for administrating the Greybus
+network via the SVC.
+
+During initial development of the ARA smartphone, there were no SoCs available
+with inbuilt UniPro support. Separate hardware entities were designed to
+connect the AP to the Unipro network. These entities receive a message from AP
+which they translate and send to the UniPro network. The same was also required
+in the other direction: receive messages from UniPro and translate them to the
+AP. These entities are called as AP Bridge (APB) host controllers. They can
+receive messages over USB and send them over UniPro and vice versa. The AP
+isn’t part of the Greybus network really and so isn’t represented in the above
+picture. The Greybus subsystem also supports processors with inbuilt UniPro
+support; they are represented by native UniPro host controllers. The AP can
+talk directly to them without the USB subsystem.
+
+During module initialization (after the module is detected on Greybus), the
+Greybus core parses the module’s manifest, which describes the capabilities of
+the modules, and creates devices within the kernel to represent them.
+
+Power management for the whole UniPro network (i.e. AP, SVC and modules) is
+managed by the Greybus core. During system suspend, the Greybus core puts the
+SVC and the modules into low-power states and, on system resume, it brings up
+the Greybus network. The Greybus core also performs runtime power management
+for all individual entities. For example, if a module isn’t being used, the
+Greybus core will power it off and will bring it back only when it is required.
+
+
+The Greybus core also binds itself to the Linux kernel driver core and provides
+a sysfs interface at /sys/bus/greybus. The following diagram depicts the sysfs
+hierarchy for a single AP Bridge (APB) connected to the AP. A single module is
+accessible via the APB and the module presents a single interface which
+contains two bundles (devices) within it. The figure also represents the
+control CPort per interface and the SVC per APB, along with a list of
+attributes for each entity. All of these entities will be described later in
+detail.
+
+----
+
+greybus/
+└── greybus1 (AP Bridge)
+ ├── 1-2 (Module)
+ │ ├── 1-2.2 (Interface)
+ │ │ ├── 1-2.2.1 (Bundle)
+ │ │ │ ├── bundle_class
+ │ │ │ ├── bundle_id
+ │ │ │ └── state
+ │ │ ├── 1-2.2.2 (Bundle)
+ │ │ │ ├── bundle_class
+ │ │ │ ├── bundle_id
+ │ │ │ └── state
+ │ │ ├── 1-2.2.ctrl (Control CPort)
+ │ │ │ ├── product_string
+ │ │ │ └── vendor_string
+ │ │ ├── ddbl1_manufacturer_id
+ │ │ ├── ddbl1_product_id
+ │ │ ├── interface_id
+ │ │ ├── product_id
+ │ │ ├── serial_number
+ │ │ └── vendor_id
+ │ ├── eject
+ │ ├── module_id
+ │ └── num_interfaces
+ ├── 1-svc (SVC)
+ │ ├── ap_intf_id
+ │ ├── endo_id
+ │ └── intf_eject
+ └── bus_id
+
+----
+
+The functionality provided by the modules is exposed using device-class and
+bridged PHY drivers. The device-class drivers implement protocols whose purpose
+is to provide a device abstraction for the functionality commonly found on the
+mobile handsets. For example, cameras, batteries, sensors, etc. The bridged PHY
+drivers, instead, implement protocols whose purpose is to support communication
+with the modules on the Greybus network that do not comply with a device-class
+protocol; these include integrated circuits using alternative physical
+interfaces to UniPro. For example, devices connected via GPIO, I2C, SPI, USB,
+etc. The modules which only implement device-class protocols are said to be
+device-class conformant. Modules which implement any of the bridged PHY
+protocols are said to be non-device-class conformant. The device classes and
+bridged PHY protocols will be listed later.
+
+Module hierarchy
+~~~~~~~~~~~~~~~~
+
+A module is the physical hardware entity that can be connected or disconnected
+statically (before powering the system on) or dynamically (while the system is
+running) from the Greybus network. Once the modules are connected to the
+Greybus network, the AP and the SVC enumerate the modules and fetch per
+interface manifest to learn about their capabilities. The following figure
+gives a glimpse of how the module hierarchy looks like in the Greybus
+subsystem.
+
+image::./modules.png[title="Module hierarchy",height=500,width=800,align="center"]
+
+Modules are represented within the Linux kernel by struct gb_module:
+
+struct gb_module {
+ struct device dev;
+ u8 module_id;
+ size_t num_interfaces;
+ struct gb_interface *interfaces[0];
+ ...
+}
+
+
+Here, dev is the module’s device structure, module_id is a unique eight-bit
+number assigned to the module by the SVC, interfaces points to the interfaces
+present within the module and num_interfaces is their count.
+
+
+The Greybus modules have electrical connectors on them, connecting them to the
+phone’s frame. These electrical connectors are called "interface blocks" and
+are represented in software by the term "interface". A module can have one or
+more interfaces. The interface with the smallest interface ID is configured as
+the primary interface and all other are called secondary interfaces. The
+module_id is set to the ID of the primary interface.
+
+The primary interface is special as the AP receives module insertion event with
+the ID of the primary interface and the module can be ejected from the frame
+only using the primary interface. The interfaces can present any number of
+functionalities, which can be supported with the bandwidth available to the
+respective interface block. The interfaces are represented within the Linux
+kernel by the struct gb_interface:
+
+struct gb_interface {
+ struct device dev;
+ struct gb_control *control;
+ struct list_head bundles;
+ struct list_head manifest_descs;
+ u8 interface_id;
+ struct gb_module *module;
+ ...
+};
+
+----
+
+Here, dev is the interface’s device structure, control represents the control
+connection (described below), bundles is the list containing bundles within the
+interface, manifest_descs is the lists of descriptors created from the
+interface manifest, interface_id is the unique ID of the interface, and module
+is the pointer to the parent module structure. The module ID and interface ID
+both start from zero and are unique within the Greybus network.
+
+
+The Greybus Interfaces can contain one or more bundles, each of which
+represents a logical Greybus device in the kernel. For example, an interface
+with vibrator and battery functionalities will have two bundles, one for the
+vibrator and one for the battery. Each bundle will get a struct device for
+itself and a greybus driver will bind to that device. The bundle ID is unique
+within an interface. The bundles are represented within the Linux kernel by the
+struct gb_bundle:
+
+struct gb_bundle {
+ struct device dev;
+ struct gb_interface *intf;
+ u8 id;
+ u8 class;
+ size_t num_cports;
+ struct list_head connections;
+ ...
+}
+Here, dev is the bundle’s device structure, intf is the pointer to the parent
+interface, id is the unique ID of the bundle within the interface, class is the
+class type of the bundle (like, camera or audio), connections is the
+connections within the bundle and num_cports is the count of the connections.
+
+The Greybus driver is represented by the following structure and it accepts the
+bundle structure as argument to all its callbacks:
+
+struct greybus_driver {
+ const char *name;
+ int (*probe)(struct gb_bundle *bundle,
+ const struct greybus_bundle_id *id);
+ void (*disconnect)(struct gb_bundle *bundle);
+ const struct greybus_bundle_id *id_table;
+ struct device_driver driver;
+};
+Here, name is the name of the Greybus driver, probe and disconnect are the
+callbacks, id_table is the device bundle id table and driver is the generic
+device driver structure.
+
+The Greybus or UniPro "connection" is a bidirectional communication path
+between two CPorts. There can be one or more CPorts within a bundle. The
+communication over the connections is governed by a predefined set of
+operations and the semantics of those operations is defined by the Greybus
+protocols (covered later). Each CPort is managed by exactly one protocol. The
+CPort numbers are unique within an interface. The first CPort within the
+interface is always the control CPort (CPort0, which is not part of any bundle)
+while the rest of the CPorts are numbered starting with one. CPort0 is special
+and is used for the management of its interface. It is governed by a special
+protocol, the control protocol (described later). The connections are
+represented within Linux kernel by the struct gb_connection:
+
+struct gb_connection {
+ struct gb_host_device *hd;
+ struct gb_interface *intf;
+ struct gb_bundle *bundle;
+ u16 hd_cport_id;
+ u16 intf_cport_id;
+ struct list_head operations;
+ ...
+};
+Here, hd represents the AP bridge through which the AP communicates with the
+module, intf represents the parent interface, bundle represents the parent
+bundle, hd_cport_id represents the CPort ID of the AP bridge, intf_cport_id
+represents the CPort ID of the interface, and operations is the list of
+operations which are getting exchanged over the connection. The connection is
+established between hd_cport_id and intf_cport_id.
+
+Greybus bundles can also represent complex functionalities, such as audio or
+camera. Normally, such complex devices consist of multiple components working
+together, like sensors, DMA, bridge, DAIs, codecs, etc., and a single bundle
+device may look insufficient to represent them all. But that’s how Greybus
+represents such devices. The module side contains the firmware which makes all
+these components work together; it takes inputs from the AP over the
+connections present within the bundle. For example, a bundle representing the
+camera will have two connections: data and management. All management
+instructions are sent to the module or configurations are received from the
+module using the management connection. And the data from the camera on the
+module is received over the data connection. The internals of how various
+components work together to represent the camera are hidden from Greybus and
+hence the AP.
+
+When a module and its interfaces are connected to the Greybus network (by
+attaching the module to the frame of the phone), the AP starts enumerating its
+interfaces over CPort0. The AP fetches a block of data from the interfaces,
+called the interface manifest. The manifest is a data structure containing the
+manifest header along with a set of descriptors. The manifest allows the AP to
+learn about the capabilities of the interface.
+
+Following is a simple example of a raw manifest file that represents an
+interface which supports a single audio bundle. The manifest file is converted
+into a binary blob using the Manifesto library. In the following example, the
+bundle has two connections: "Management" and "Data". Note that it is optional
+to add the control CPort0 in the manifest file.
+
+; Simple Audio Interface Manifest
+;
+; Provided under the three clause BSD license found in the LICENSE file.
+
+[manifest-header]
+version-major = 0
+version-minor = 1
+
+[interface-descriptor]
+vendor-string-id = 1
+product-string-id = 2
+
+; Interface vendor string
+[string-descriptor 1]
+string = Project Ara
+
+; Interface product string
+[string-descriptor 2]
+string = Simple Audio Interface
+
+; Bundle 1: Audio class
+[bundle-descriptor 1]
+class = 0x12
+
+; Audio Management protocol on CPort 1
+[cport-descriptor 1]
+bundle = 1
+protocol = 0x12
+
+; Audio Data protocol on CPort 2
+[cport-descriptor 2]
+bundle = 1
+protocol = 0x13
+
+Greybus messages
+~~~~~~~~~~~~~~~~
+
+Greybus communication is built on UniPro messages, which are used to exchange
+information between the AP, SVC, and the modules. Normally all communication is
+bidirectional, in that for every request message from a sender, the receiver
+responds with a response message. Which entity (AP, SVC or the module) can
+initiate a request message depends on the individual operation as defined by
+the respective protocol. For example, only the AP can initiate operations on
+the control protocol. Some of the operations are unidirectional operations as
+well, where the receiver doesn’t need to generate a response message.
+
+Each message sent over UniPro begins with a short header, followed by
+operation-specific payload data. The message header is represented by following
+structure:
+
+struct gb_operation_msg_hdr {
+ __le16 size; /* Size in bytes of header + payload */
+ __le16 operation_id; /* Operation unique id */
+ __u8 type; /* E.g GB_I2C_TYPE_TRANSFER */
+ __u8 result; /* Result of request (in responses only) */
+ __u8 pad[2]; /* must be zero (ignore when read) */
+}
+Here, size is the size of the header (8 bytes) plus size of the payload data.
+The size of payload data is defined by each operation of every protocol. The
+operation_id is a unique, 16-bit number which is used to match request and
+response messages. The operation_id allows many operations to be "in flight" on
+a connection at once. The special ID zero is reserved for unidirectional
+operations. The operation type is an eight-bit number that defines the type of
+the operation. The meaning of the type value depends on the protocol in use.
+Only 127 operations are available for a given protocol (0x01..0x7f); Operation
+0x00 is reserved. The most significant bit (0x80) of the operation type is used
+as a flag that distinguishes a request message from its response. For requests,
+this bit is zero, for responses, it is one. The result field is ignored for
+request messages; it contains the result of the requested operation in the
+response message.
+
+Greybus messages (both request and response) are managed using the following
+structure within the Linux kernel:
+
+struct gb_message {
+ struct gb_operation *operation;
+ struct gb_operation_msg_hdr *header;
+ void *payload;
+ size_t payload_size;
+ ...
+};
+Here, operation is the operation to which the message belongs, header is the
+header to be sent over unipro, payload is the payload to be sent following the
+header and payload_size is the size of the payload.
+
+An entire Greybus operation (a request and its response) is managed using the
+following structure within the Linux kernel:
+
+struct gb_operation {
+ struct gb_connection *connection;
+ struct gb_message *request;
+ struct gb_message *response;
+ u8 type;
+ u16 id;
+ ...
+};
+Here, connection represents the communication path over which Unipro messages
+are sent, request and response represent the Greybus messages, and type and id
+are as described earlier in the message header.
+
+There are multiple helpers which can be used to send/receive Greybus messages
+over a connection, but most of the users end up using following:
+
+int gb_operation_sync_timeout(struct gb_connection *connection, int type,
+ void *request, int request_size, void *response,
+ int response_size, unsigned int timeout);
+Here, connection represents the communication path over which Unipro messages
+are sent, type is the operation type, request is the request payload,
+request_size is the size of the request payload, response is the space for
+response payload, response_size is the size of the expected response payload,
+and timeout is the timeout period for the operation in milliseconds. Usually, a
+timeout of 1ms is chosen.
+
+gb_operation_sync_timeout() creates the operation and its messages, copies the
+request payload into the request message, and sends the request message header
+and its payload over the Greybus connection. It then waits for timeout
+milliseconds for a response from the other side and errors out if no response
+is received within that time. Otherwise, once the response is received, it
+first checks the response header to check the result of the operation. If
+result field indicates an error, then gb_operation_sync_timeout() errors out.
+Otherwise, it copies the response payload into the memory pointed by the
+response field and destroys the operation and message structures. It returns
+zero on success or a negative number to represent errors.
+
+Greybus protocols
+~~~~~~~~~~~~~~~~~
+
+The Greybus protocols define the layout and semantics of Greybus messages that
+are exchanged over a Greybus connection. Each Greybus protocol defines a set of
+operations, with formats of their request and response messages. It also
+defines which side of the connection can initiate each request. The Greybus
+protocols are broadly divided into three categories: special protocols,
+device-class protocols and bridged PHY protocols.
+
+The special protocols are the core Greybus protocols with administrating powers
+over the Greybus network. There are two special protocols: SVC and Control.
+
+The SVC protocol serves the purpose of communication between the AP and the
+SVC. The AP controls the network via the SVC using this protocol. The CPort0 of
+the APB is used for the SVC connection (don’t confuse that with the CPort0 of
+each module interface which is used for the control protocol). The main purpose
+of this protocol is to help the AP create routes between various CPorts, sense
+module insertion or removal, etc. The modules on the Greybus network should not
+implement it. Operations defined under this protocol include module insertion
+and removal events, the creation and destruction of routes and connections, and
+more.
+
+The control protocol serves the purpose of communication between the AP and the
+module’s interfaces. The AP controls individual interfaces using this protocol.
+The main purpose of this protocol is to help the AP enumerate a new interface
+and learn about its capabilities. Only the AP can initiate operations (send
+requests) under this protocol and the module must respond to those requests.
+Some of the operations allowed under this protocol can get an interface's
+manifest, or activate, deactivate, suspend, or resume a bundle.
+
+As mentioned earlier, the device-class protocols provide a device abstraction
+for the functionality commonly found on mobile handsets. A simple example of
+that can be the audio management protocol or the camera management protocol.
+The bridged PHY protocols provide communication with the modules on the Greybus
+network which do not comply with an existing device class Protocol, and which
+include integrated circuits using alternative physical interfaces to UniPro.
+
+What's next
+~~~~~~~~~~~
+
+It would be interesting to do a couple of things going forward. To begin with,
+the Greybus core should be moved from the staging tree to the kernel's drivers
+directory. The drivers themselves should be moved to their respective
+frameworks: .../greybus/gpio.c would become .../gpio/gpio-greybus.c, for
+example. This would take reasonable amount of time and effort.
+
+Later on, it would be nice to get Motorola's Moto Mods support merged into the
+kernel and include its improvements to the Greybus subsystem. That will mostly
+depend on the Motorola community, though. As Project Ara is discontinued now, it
+would be quite interesting to find new targets (like the Internet of things) for
+the Greybus subsystem and adapt Greybus to support them. Some discussions are
+going on about that.
diff --git a/greybus/greybus.png b/greybus/greybus.png
new file mode 100644
index 0000000..a5c81fc
--- /dev/null
+++ b/greybus/greybus.png
Binary files differ
diff --git a/greybus/greybus_subsystem.html b/greybus/greybus_subsystem.html
new file mode 100644
index 0000000..6dcae4d
--- /dev/null
+++ b/greybus/greybus_subsystem.html
@@ -0,0 +1,517 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta name="generator" content="AsciiDoc 8.6.9">
+<title>The Greybus Subsystem</title>
+</head>
+<body>
+<h1>The Greybus Subsystem</h1>
+<p>
+</p>
+<a name="preamble"></a>
+<p>The Linux kernel gained a new subsystem during the
+<a href="https://lwn.net/Articles/708766/">4.9</a> release, <strong>Greybus</strong> and this article
+will briefly take you through the internals of it.</p>
+<p>Greybus was initially designed for Google&#8217;s Project
+<a href="https://atap.google.com/ara/">ARA</a> smartphone (which is
+<a href="http://www.theverge.com/2016/9/2/12775922/google-project-ara-modular-phone-suspended-confirm">discontinued</a>
+now), but the first (and only) product released with it is Motorola&#8217;s
+<a href="https://www.motorola.com/us/moto-mods">Moto Mods</a>. There are also
+discussions going on to evaluate the feasibility of using the robust protocols
+provided by Greybus in non-Unipro applications like
+<a href="https://en.wikipedia.org/wiki/Internet_of_things">IoT</a>, and other parts of
+the kernel that need to communicate in a platform independent way.</p>
+<p>Initially, Greg Kroah-Hartman tried to merge Greybus core in the drivers
+directory directly (drivers/greybus), but after some objection (people
+wanted to do more detailed reviews before merging it) everyone agreed to merge
+it in the staging tree (drivers/staging/greybus). Almost <strong>2400</strong> patches,
+developed over 2.5 years, got merged with contributions from 50+ developers from
+at least 5 organizations (Google, <a href="https://www.linaro.org/">Linaro</a>,
+<a href="http://baylibre.com/">BayLibre</a>, <a href="http://www.leaflabs.com/">LeafLabs</a>,
+<a href="https://www.mm-sol.com/">MMSolutions</a>). There were a lot more developers and
+companies involved for developing other parts of the ARA software and hardware.
+Greybus developers also showed in the
+<a href="https://lwn.net/Articles/708266/">list</a> of most active developers (Top four
+by changesets) for the 4.9 release.</p>
+<p>Greg made sure Greybus gets merged with all its history preserved and he
+<a href="https://lwn.net/Articles/700618/">said</a>:</p>
+<pre><code>"Because this was 2 1/2 years of work, with many many developers
+contributing, I didn't want to flatten all of their effort into a few
+small patches, as that wouldn't be very fair. So I've built a git tree
+with all of the changes going back to the first commit, and merged it
+into the kernel tree, just like btrfs was merged into the kernel."</code></pre>
+<p>Jonathan Corbet wrote an <a href="https://lwn.net/Articles/648400/">article</a> earlier
+on Greybus. The readers may want to look at that to catchup on some history.</p>
+<hr>
+<h2><a name="_unipro_and_the_internals_of_the_greybus_subsystem"></a>UniPro And The Internals of the Greybus Subsystem</h2>
+<p>The Project ARA smartphone was designed to be customizable. The user can select
+a subset of physical modules from a wide range of modules, providing interesting
+capabilities (like cameras, speakers, batteries, displays, sensors, etc), and
+attach them to the frame of the phone. The modules can communicate with the main
+processors or other modules directly over the
+<a href="http://mipi.org/specifications/unipro-specifications">UniPro</a> bus.</p>
+<p>The specifications of the UniPro bus are defined by the
+<a href="http://mipi.org/">MIPI</a> alliance. UniPro follows the architecture of the
+classical OSI network model, except that it has no application layer defined.
+And that&#8217;s where Greybus fits in. Project Ara developers defined their own
+application layer for Unipro and that is known as <strong>Greybus</strong>.</p>
+<p>UniPro communication happens over bidirectional connections between entities,
+like the Modules on the ARA smartphone; it doesn&#8217;t need to go through the
+processors. Each UniPro device has virtual ports within it, which can be seen
+as sub-addresses within the device. They are a lot like sockets and are called
+as <strong>Connection Ports</strong> (CPort). There is a switch on the bus that sets up the
+actual routes. Messages can pass at a rate of around 10Gb/s; the bus also has
+message prioritization, error handling, and notification of delivery problems,
+thought Unipro doesn&#8217;t support streams or multicast delivery.</p>
+<p>As the Greybus specification was initially written for the Project ARA
+smartphone, it is greatly inspired by ARA&#8217;s design, where modules can be
+inserted into or removed from the phone&#8217;s frame on the fly. A lot of efforts
+have been put to make the specification as generic as possible, in order to make
+it fit for other use cases as well. You will also notice a lot of similarities
+with the <a href="https://en.wikipedia.org/wiki/USB">USB</a> Bus framework in Linux
+kernel, as it was taken as a reference during the development of Greybus.</p>
+<p>Greybus specification provides:</p>
+<ul>
+<li>
+<p>
+Device discovery and description at runtime.
+</p>
+</li>
+<li>
+<p>
+Network routing and housekeeping.
+</p>
+</li>
+<li>
+<p>
+Class and Bridged PHY protocols, which devices use to talk to each other, and
+ the processors.
+</p>
+</li>
+</ul>
+<p>The following figure gives a glimpse of how various parts of the kernel interact
+with the Greybus Subsystem.</p>
+<div align="center">
+<img src="./greybus.png" style="border-width: 0;" alt="./greybus.png" width="800" height="500">
+<p><b>Figure 1. </b>The Greybus Subsystem</p>
+</div>
+<p>The <strong>Greybus Core</strong> at the center of the figure is the soul of the Greybus
+subsystem.</p>
+<p>The <strong>Application Processor</strong> (AP) represents the group of processors which run
+the host operating system, Linux in our case.</p>
+<p>The Greybus Core implements the SVC protocol (described later), which is used by
+the Application Processor (AP) to communicate to the <strong>Supervisory Controller</strong>
+(SVC). The SVC represents an entity within the Greybus network which configures
+and controls the Greybus (UniPro) network, mostly based on the instructions from
+the AP. All module insertion and removal events are first reported to the SVC,
+which in turn informs the AP about them using the SVC protocol. The AP is
+responsible for administrating the Greybus network via the SVC.</p>
+<p>During initial development of the ARA smartphone, there were no SoCs available
+with inbuilt UniPro support. Separate hardware entities were designed to connect
+the AP to the Unipro network. These entities receive a message from AP
+(non-UniPro), translate and send that to the UniPro network. The same was also
+required in the other direction, i.e. Receive messages from UniPro and translate
+them to the AP. These entities are called as <strong>AP Bridge</strong> host controllers (APB).
+They can receive messages over USB and send them over UniPro and vice versa. The
+AP isn&#8217;t part of the Greybus network really and so isn&#8217;t represented in the
+above picture. The APB is connected to one side of the USB subsystem and the AP
+is on the other side of it. The Greybus subsystem also supports SoCs with
+inbuilt UniPro support and they are represented by <strong>Native UniPro</strong> host
+controller. The AP can talk directly to them without the USB subsystem.</p>
+<p>During module initialization (after the module is detected on Greybus), the
+Greybus Core parses Module&#8217;s <strong>Manifest</strong> (which describe the capabilities of the
+modules, described later) and create devices within the kernel to represent
+them.</p>
+<p><strong>Power management</strong> of the whole UniPro network (i.e. AP, SVC and modules) is
+managed by the Greybus Core. During system suspend, the Greybus core puts the
+SVC and the modules to low power states and on system resume it brings up the
+Greybus network. The Greybus Core also performs runtime PM for all individual
+entities. For example, if a module isn&#8217;t getting used currently, the Greybus
+core will power it off and will bring it back only when it is required to be
+used.</p>
+<p>The Greybus Core also binds itself to the Linux kernel driver core and provides
+a sysfs interface at /sys/bus/greybus. Following depicts the sysfs hierarchy for
+a single AP Bridge (APB) connected to the AP. A single Module is accessible via
+the APB and the module presents a single Interface which contains two Bundles
+within it. The figure also represents the Control CPort per interface and the
+SVC per APB, along with a list of attributes for each entity. All of these
+entities will be described later in detail.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>greybus/
+└── greybus1 (AP Bridge)
+ ├── 1-2 (Module)
+ │   ├── 1-2.2 (Interface)
+ │   │   ├── 1-2.2.1 (Bundle)
+ │   │   │   ├── bundle_class
+ │   │   │   ├── bundle_id
+ │   │   │   └── state
+ │   │   ├── 1-2.2.2 (Bundle)
+ │   │   │   ├── bundle_class
+ │   │   │   ├── bundle_id
+ │   │   │   └── state
+ │   │   ├── 1-2.2.ctrl (Control CPort)
+ │   │   │   ├── product_string
+ │   │   │   └── vendor_string
+ │   │   ├── ddbl1_manufacturer_id
+ │   │   ├── ddbl1_product_id
+ │   │   ├── interface_id
+ │   │   ├── product_id
+ │   │   ├── serial_number
+ │   │   └── vendor_id
+ │   ├── eject
+ │   ├── module_id
+ │   └── num_interfaces
+ ├── 1-svc (SVC)
+ │   ├── ap_intf_id
+ │   ├── endo_id
+ │   └── intf_eject
+ └── bus_id
+
+7 directories, 21 files</code></pre>
+</td></tr></table>
+<p>The functionality provided by the modules is exposed using Device class and
+Bridged PHY drivers. The <strong>Device class drivers</strong> implement protocols whose
+purpose is to provide a device abstraction for the functionality commonly found on
+the mobile handsets. For example camera, battery, sensors, etc. The <strong>Bridged PHY
+drivers</strong> implement protocols whose purpose is to support communication with the
+modules on the Greybus network which do not comply with an Device class
+protocol, and which include integrated circuits using alternative physical
+interfaces to UniPro. For example GPIO, I2C, SPI, USB, etc. The modules which
+only implement Device class protocols are said to be device class conformant.
+Modules which implement any of the Bridged PHY protocols are said to be
+non-device class conformant. The Device class and Bridged PHY protocols will be
+listed later.</p>
+<h3><a name="_module_hierarchy"></a>Module Hierarchy</h3>
+<p>A <strong>Module</strong> is the physical hardware entity that can be connected or disconnected
+statically (before powering ON the system) or dynamically (while the system is
+running) from the Greybus network. Once the modules are connected to the Greybus
+network, the AP and the SVC enumerate the modules and fetch per interface
+manifest to learn about their capabilities.</p>
+<p>The following figure gives a glimpse of how the module hierarchy looks like in
+the Greybus subsystem.</p>
+<div align="center">
+<img src="./modules.png" style="border-width: 0;" alt="./modules.png" width="800" height="500">
+<p><b>Figure 2. </b>Module hierarchy</p>
+</div>
+<p>Modules are represented within the Linux kernel by the <strong>struct gb_module</strong>.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct gb_module {
+ struct device dev;
+ u8 module_id;
+ size_t num_interfaces;
+ struct gb_interface *interfaces[0];
+
+ ...
+};</code></pre>
+</td></tr></table>
+<p>Here, <strong>dev</strong> is the module&#8217;s device structure, <strong>module_id</strong> is a unique 8 bit
+number assigned to the module by the SVC, <strong>interfaces</strong> are the Interfaces
+present within the module and <strong>num_interfaces</strong> is their count.</p>
+<p>The Greybus modules have electrical connectors on them, which connect them to
+the phone&#8217;s frame. These electrical connectors are called as Interface Blocks
+and are represented in software by the term <strong>Interface</strong>. A module can have one
+or more Interfaces. The Interface with a smaller interface ID is configured as
+the primary interface and all other are called as secondary interfaces. The
+<strong>module_id</strong> is set to the ID of the primary interface. The primary interface is
+special as the AP receives module insertion event with the ID of the primary
+interface and the module can be ejected from the frame only using the primary
+interface. The interfaces can present any number of functionalities (like:
+camera, audio, battery, etc), which can be supported with the bandwidth
+available to the respective Interface block. The interfaces are represented
+within the Linux kernel by the <strong>struct gb_interface</strong>.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct gb_interface {
+ struct device dev;
+ struct gb_control *control;
+
+ struct list_head bundles;
+ struct list_head manifest_descs;
+ u8 interface_id;
+ struct gb_module *module;
+
+ ...
+};</code></pre>
+</td></tr></table>
+<p>Here, <strong>dev</strong> is the interface&#8217;s device structure, <strong>control</strong> represents the
+Control connection (described later), <strong>bundles</strong> is the list containing bundles
+within the interface, <strong>manifest_descs</strong> is the lists of descriptors created from
+the interface manifest, <strong>interface_id</strong> is the unique ID of the interface, and
+<strong>module</strong> is the pointer to the parent module structure. Both module ID and
+interface ID start from 0 and are unique within the Greybus network.</p>
+<p>The Greybus Interfaces can contain one or more <strong>bundles</strong>. Each bundle represents
+a logical greybus device in the kernel. For example, an Interface with vibrator
+and battery functionalities will have two bundles. One bundle for the vibrator
+and one bundle for the battery functionality. Each bundle will get a <strong>struct
+device</strong> for itself and a greybus driver will bind to that device. The bundle ID
+is unique within an Interface and it starts from 1; 0 is reserved. The bundles
+are represented within the Linux kernel by the <strong>struct gb_bundle</strong>.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct gb_bundle {
+ struct device dev;
+ struct gb_interface *intf;
+
+ u8 id;
+ u8 class;
+ size_t num_cports;
+ struct list_head connections;
+
+ ...
+};</code></pre>
+</td></tr></table>
+<p>Here, <strong>dev</strong> is the bundle&#8217;s device structure, <strong>intf</strong> is the pointer to the
+parent interface, <strong>id</strong> is the unique ID of the bundle within the interface,
+<strong>class</strong> is the Class type of the bundle (like, camera or audio), <strong>connections</strong>
+is the connections within the bundle and <strong>num_cports</strong> is the count of the
+connections.</p>
+<p>The Greybus driver is represented by the following structure and it accepts the
+bundle structure as argument to all its callbacks.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct greybus_driver {
+ const char *name;
+
+ int (*probe)(struct gb_bundle *bundle,
+ const struct greybus_bundle_id *id);
+ void (*disconnect)(struct gb_bundle *bundle);
+
+ const struct greybus_bundle_id *id_table;
+
+ struct device_driver driver;
+};</code></pre>
+</td></tr></table>
+<p>Here, <strong>name</strong> is the name of the Greybus driver, <strong>probe</strong> and <strong>disconnect</strong> are the
+callbacks, <strong>id_table</strong> is the device bundle id table and <strong>driver</strong> is the generic
+device driver structure.</p>
+<p>The Greybus or UniPro <strong>connection</strong> is a bidirectional communication path between
+two CPorts. There can be one or more CPorts within a bundle. The communication
+over the connections is governed by a predefined set of operations and the
+semantics of those operations is defined by Greybus Protocols (covered later).
+Each CPort is managed by exactly one protocol. The CPort numbers are unique
+within an interface. The first CPort within the interface is always the control
+CPort0 (not part of any bundle) and rest of the CPorts are numbered starting 1
+and should be part of a bundle. CPort0 of an interface is special and is used
+for the management of the Interface. CPort0 of an interface is governed by a
+special protocol, Control Protocol (described later). The connections are
+represented within Linux kernel by the <strong>struct gb_connection</strong>.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct gb_connection {
+ struct gb_host_device *hd;
+ struct gb_interface *intf;
+ struct gb_bundle *bundle;
+ u16 hd_cport_id;
+ u16 intf_cport_id;
+ struct list_head operations;
+
+ ...
+};</code></pre>
+</td></tr></table>
+<p>Here, <strong>hd</strong> represents the AP bridge through which the AP is communicating to the
+module, <strong>intf</strong> represents the parent interface, <strong>bundle</strong> represents the parent
+bundle, <strong>hd_cport_id</strong> represents the CPort ID of the AP bridge, <strong>intf_cport_id</strong>
+represents the CPort ID of the interface, and <strong>operations</strong> is the list of
+operations which are getting exchanged over the connection. The connection is
+established between hd_cport_id and intf_cport_id.</p>
+<p>The Greybus bundles can also represent complex functionalities, like audio or
+camera. Normally such complex devices consist of multiple components working
+together, like sensors, DMA, bridge, DAIs, codecs, etc., and a single bundle
+device may look insufficient to represent them. But that&#8217;s how Greybus
+represents such devices. The module side contains the real firmware which makes
+all these components work together and the module firmware takes inputs from the
+AP over the connections present within the bundle. For example, a bundle
+representing the camera will have two connections: data and management. All
+management instructions are sent to the module or configurations are received
+from the module using the management connection. And the data from the camera on
+the module is received over the data connection. The internals of how various
+components work together to represent the camera are hidden from Greybus and
+hence the AP.</p>
+<p>When a module and its interfaces are connected to the Greybus network (by
+attaching the module to the frame of the phone), the AP starts enumerating its
+interfaces over the CPort0. The AP fetches a block of data from the interfaces,
+called as the <strong>Interface Manifest</strong>. The manifest is a data structure, which
+contains manifest header along with a set of descriptors. The manifest allows
+the AP to learn about the capabilities of the interface.</p>
+<p>Following is a simple example of a raw manifest file, that represents an
+interface which supports a single <strong>Audio</strong> bundle. The manifest file is then
+converted into a binary blob using the
+<a href="https://github.com/projectara/manifesto">Manifesto</a> library. In the
+following example, the bundle has two connection: Management and Data. Note that
+it is optional to add the control CPort0 in the manifest file.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>; Simple Audio Interface Manifest
+;
+; Provided under the three clause BSD license found in the LICENSE file.
+
+[manifest-header]
+version-major = 0
+version-minor = 1
+
+[interface-descriptor]
+vendor-string-id = 1
+product-string-id = 2
+
+; Interface vendor string
+[string-descriptor 1]
+string = Project Ara
+
+; Interface product string
+[string-descriptor 2]
+string = Simple Audio Interface
+
+; Bundle 1: Audio class
+[bundle-descriptor 1]
+class = 0x12
+
+; Audio Management protocol on CPort 1
+[cport-descriptor 1]
+bundle = 1
+protocol = 0x12
+
+; Audio Data protocol on CPort 2
+[cport-descriptor 2]
+bundle = 1
+protocol = 0x13</code></pre>
+</td></tr></table>
+<h3><a name="_greybus_messages"></a>Greybus Messages</h3>
+<p>Greybus communication is built on UniPro messages, which are used to exchange
+information between the AP, SVC, and the modules. Normally all communication is
+bidirectional, i.e. For every request message from sender, the receiver responds
+with a response message. Which entity (AP, SVC or the Module) can initiate a
+request message depends on the individual operation as defined by the respective
+protocol. For example, only the AP can initiate operations on the Control
+protocol. Some of the operations are Unidirectional operations as well, where
+the receiver doesn&#8217;t need to respond back with a response message.</p>
+<p>Each message sent over UniPro begins with a short header, and is followed by
+an operation specific payload data. The message header is represented by
+following structure:</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct gb_operation_msg_hdr {
+ __le16 size; /* Size in bytes of header + payload */
+ __le16 operation_id; /* Operation unique id */
+ __u8 type; /* E.g GB_I2C_TYPE_TRANSFER */
+ __u8 result; /* Result of request (in responses only) */
+ __u8 pad[2]; /* must be zero (ignore when read) */
+} __packed;</code></pre>
+</td></tr></table>
+<p>Here, <strong>size</strong> is the size of the header (8 bytes) plus size of the payload data.
+The size of payload data is defined by each operation of every protocol.</p>
+<p>The <strong>operation_id</strong> is a unique 16 bit number which is used to match request and
+response messages. The operation_id allows many operations to be "in flight" on
+a connection at once. The special ID 0 is reserved for unidirectional
+operations.</p>
+<p>The operation <strong>type</strong> is an 8 bit number that defines the type of the operation.
+The meaning of the type value depends on the protocol in use on the connection
+carrying the message. Only 127 operations are available for a given Protocol
+0x01..0x7f; Operation 0x00 is reserved. The most significant bit (0x80) of an
+operation type is used as a flag that distinguishes a request message from its
+response. For requests, this bit is 0, for responses, it is 1.</p>
+<p>The <strong>result</strong> field is ignored for the request messages and it contains the
+result of the requested operation in the response message.</p>
+<p>The Greybus messages (both request and response) are managed using the following
+structure within the Linux kernel:</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct gb_message {
+ struct gb_operation *operation;
+ struct gb_operation_msg_hdr *header;
+
+ void *payload;
+ size_t payload_size;
+
+ ...
+};</code></pre>
+</td></tr></table>
+<p>Here, <strong>operation</strong> is the operation to which the message belongs, <strong>header</strong> is the
+header to be sent over unipro, <strong>payload</strong> is the payload to be sent following the
+header and <strong>payload_size</strong> is the size of the payload.</p>
+<p>The entire Greybus Operation (request + response) is managed using the following
+structure within the Linux kernel:</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct gb_operation {
+ struct gb_connection *connection;
+ struct gb_message *request;
+ struct gb_message *response;
+
+ u8 type;
+ u16 id;
+
+ ...
+};</code></pre>
+</td></tr></table>
+<p>Here, <strong>connection</strong> represents the communication path over which Unipro messages
+are sent, <strong>request</strong> and <strong>response</strong> represent the Greybus messages, and <strong>type</strong>
+and <strong>id</strong> are as described earlier in the message header.</p>
+<p>There are multiple helpers which can be used to send/receive Greybus messages
+over a connection, but most of the users end up using following:</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>int gb_operation_sync_timeout(struct gb_connection *connection, int type,
+ void *request, int request_size, void *response,
+ int response_size, unsigned int timeout);</code></pre>
+</td></tr></table>
+<p>Here, <strong>connection</strong> represents the communication path over which Unipro messages
+are sent, <strong>type</strong> is the operation type, <strong>request</strong> is the request payload,
+<strong>request_size</strong> is the size of the request payload, <strong>response</strong> is the space for
+response payload, <strong>response_size</strong> is the size of the expected response payload,
+and <strong>timeout</strong> is the timeout period for the operation in milliseconds. Mostly a
+timeout of 1 millisecond is chosen.</p>
+<p>gb_operation_sync_timeout() first creates the operation and its messages, copies
+the request payload into the request message, and then sends the request message
+header and its payload over the Greybus connection. It then waits for "timeout"
+milliseconds for a response from the other side and errors out if no response is
+received within that time. Otherwise, once the response is received, it first
+checks the response header to check the result of the operation. If result field
+indicates an error, then gb_operation_sync_timeout() errors out. Otherwise it
+copies the response payload in the memory pointed by the "response" field and
+then destroys the operation and message structures. It returns 0 on success or a
+negative number to represent errors.</p>
+<h3><a name="_greybus_protocols"></a>Greybus Protocols</h3>
+<p>The Greybus Protocols define the layout and semantics of the Greybus messages,
+which may be exchanged over a Greybus connection. Each Greybus protocol defines
+a set of operations, with formats of their request and response messages. It
+also states, which side of the connection can initiate the request, i.e. The AP,
+the module or the SVC. The Greybus protocols are broadly divided into three
+categories: Special protocols, Device class protocols and Bridged PHY protocols.</p>
+<p>The <strong>Special</strong> protocols are the core Greybus protocols with administrating
+powers over the Greybus network. There are two special protocols: SVC and
+Control.</p>
+<p>The <strong>SVC</strong> protocol serves the purpose of communication between the AP and the
+SVC. The AP controls the network via the SVC using this protocol. The CPort0 of
+the APB is used for the SVC connection (Don&#8217;t confuse that with the CPort0 of
+each module interface which is used for control protocol). The main purpose of
+this protocol is to help the AP create routes between various CPorts, sense
+module insertion or removal, etc. The modules on the Greybus network should not
+implement it. Some of the operations allowed under this protocol are: Module
+inserted/removed events, Create/destroy route, Create/destroy connection,
+Interface eject, Interface activate, Interface resume, etc.</p>
+<p>The <strong>Control</strong> protocol serves the purpose of communication between the AP and
+the module&#8217;s interfaces. The AP controls individual interfaces using this
+protocol. The main purpose of this protocol is to help the AP enumerate a new
+interface and learn about its capabilities. Only the AP can initiate operations
+(send requests) under this protocol and the module needs to responds to those
+requests. Some of the operations allowed under this protocol are: Get interface
+manifest, Bundle suspend/resume/activate/deactivate, Get bundle version, etc.</p>
+<p>As mentioned earlier, the <strong>Device Class</strong> protocols provide a device abstraction
+for the functionality commonly found on mobile handsets. A simple example of
+that can be the audio management protocol or the camera management protocol.
+Following are various types of such protocols implemented in the Linux kernel
+Greybus subsystem: Audio Management Protocol, Camera Management Protocol,
+Firmware Management Protocol, Firmware Download Protocol, Component
+Authentication Protocol, HID Protocol, Lights Protocol, Log Protocol, Power
+Supply Protocol, Loopback Protocol, Raw Protocol, and Vibrator Protocol.</p>
+<p>The <strong>Bridged PHY</strong> protocols provide communication with the modules on the
+Greybus network which do not comply with an existing device class Protocol, and
+which include integrated circuits using alternative physical interfaces to
+UniPro. Following are various types of such protocols implemented in the Linux
+kernel Greybus subsystem: USB Protocol, GPIO Protocol, SPI Protocol, SDIO
+Protocol, UART Protocol, PWM Protocol, and I2C Protocol.</p>
+<p>The individual protocols aren&#8217;t described in great detail here to keep this
+article short.</p>
+<hr>
+<h2><a name="_thanks"></a>Thanks</h2>
+<p>Thanks to Jonathan Corbet for his help in reviewing this article.</p>
+<p></p>
+<p></p>
+<hr><p><small>
+Last updated
+ 2017-02-08 17:10:59 IST
+</small></p>
+</body>
+</html>
diff --git a/greybus/greybus_subsystem.txt b/greybus/greybus_subsystem.txt
new file mode 100644
index 0000000..6300e31
--- /dev/null
+++ b/greybus/greybus_subsystem.txt
@@ -0,0 +1,580 @@
+The Greybus Subsystem
+=====================
+
+The Linux kernel gained a new subsystem during the
+link:https://lwn.net/Articles/708766/[4.9] release, *Greybus* and this article
+will briefly take you through the internals of it.
+
+Greybus was initially designed for Google's Project
+link:https://atap.google.com/ara/[ARA] smartphone (which is
+link:http://www.theverge.com/2016/9/2/12775922/google-project-ara-modular-phone-suspended-confirm[discontinued]
+now), but the first (and only) product released with it is Motorola's
+link:https://www.motorola.com/us/moto-mods[Moto Mods]. There are also
+discussions going on to evaluate the feasibility of using the robust protocols
+provided by Greybus in non-Unipro applications like
+link:https://en.wikipedia.org/wiki/Internet_of_things[IoT], and other parts of
+the kernel that need to communicate in a platform independent way.
+
+Initially, Greg Kroah-Hartman tried to merge Greybus core in the drivers
+directory directly (drivers/greybus), but after some objection (people
+wanted to do more detailed reviews before merging it) everyone agreed to merge
+it in the staging tree (drivers/staging/greybus). Almost *2400* patches,
+developed over 2.5 years, got merged with contributions from 50+ developers from
+at least 5 organizations (Google, link:https://www.linaro.org/[Linaro],
+link:http://baylibre.com/[BayLibre], link:http://www.leaflabs.com/[LeafLabs],
+link:https://www.mm-sol.com/[MMSolutions]). There were a lot more developers and
+companies involved for developing other parts of the ARA software and hardware.
+Greybus developers also showed in the
+link:https://lwn.net/Articles/708266/[list] of most active developers (Top four
+by changesets) for the 4.9 release.
+
+Greg made sure Greybus gets merged with all its history preserved and he
+link:https://lwn.net/Articles/700618/[said]:
+
+ "Because this was 2 1/2 years of work, with many many developers
+ contributing, I didn't want to flatten all of their effort into a few
+ small patches, as that wouldn't be very fair. So I've built a git tree
+ with all of the changes going back to the first commit, and merged it
+ into the kernel tree, just like btrfs was merged into the kernel."
+
+Jonathan Corbet wrote an link:https://lwn.net/Articles/648400/[article] earlier
+on Greybus. The readers may want to look at that to catchup on some history.
+
+
+UniPro And The Internals of the Greybus Subsystem
+-------------------------------------------------
+
+The Project ARA smartphone was designed to be customizable. The user can select
+a subset of physical modules from a wide range of modules, providing interesting
+capabilities (like cameras, speakers, batteries, displays, sensors, etc), and
+attach them to the frame of the phone. The modules can communicate with the main
+processors or other modules directly over the
+link:http://mipi.org/specifications/unipro-specifications[UniPro] bus.
+
+The specifications of the UniPro bus are defined by the
+link:http://mipi.org/[MIPI] alliance. UniPro follows the architecture of the
+classical OSI network model, except that it has no application layer defined.
+And that's where Greybus fits in. Project Ara developers defined their own
+application layer for Unipro and that is known as *Greybus*.
+
+UniPro communication happens over bidirectional connections between entities,
+like the Modules on the ARA smartphone; it doesn't need to go through the
+processors. Each UniPro device has virtual ports within it, which can be seen
+as sub-addresses within the device. They are a lot like sockets and are called
+as *Connection Ports* (CPort). There is a switch on the bus that sets up the
+actual routes. Messages can pass at a rate of around 10Gb/s; the bus also has
+message prioritization, error handling, and notification of delivery problems,
+thought Unipro doesn't support streams or multicast delivery.
+
+As the Greybus specification was initially written for the Project ARA
+smartphone, it is greatly inspired by ARA's design, where modules can be
+inserted into or removed from the phone's frame on the fly. A lot of efforts
+have been put to make the specification as generic as possible, in order to make
+it fit for other use cases as well. You will also notice a lot of similarities
+with the link:https://en.wikipedia.org/wiki/USB[USB] Bus framework in Linux
+kernel, as it was taken as a reference during the development of Greybus.
+
+Greybus specification provides:
+
+* Device discovery and description at runtime.
+* Network routing and housekeeping.
+* Class and Bridged PHY protocols, which devices use to talk to each other, and
+ the processors.
+
+The following figure gives a glimpse of how various parts of the kernel interact
+with the Greybus Subsystem.
+
+image::./greybus.png[title="The Greybus Subsystem",height=500,width=800,align="center"]
+
+The *Greybus Core* at the center of the figure is the soul of the Greybus
+subsystem.
+
+The *Application Processor* (AP) represents the group of processors which run
+the host operating system, Linux in our case.
+
+The Greybus Core implements the SVC protocol (described later), which is used by
+the Application Processor (AP) to communicate to the *Supervisory Controller*
+(SVC). The SVC represents an entity within the Greybus network which configures
+and controls the Greybus (UniPro) network, mostly based on the instructions from
+the AP. All module insertion and removal events are first reported to the SVC,
+which in turn informs the AP about them using the SVC protocol. The AP is
+responsible for administrating the Greybus network via the SVC.
+
+During initial development of the ARA smartphone, there were no SoCs available
+with inbuilt UniPro support. Separate hardware entities were designed to connect
+the AP to the Unipro network. These entities receive a message from AP
+(non-UniPro), translate and send that to the UniPro network. The same was also
+required in the other direction, i.e. Receive messages from UniPro and translate
+them to the AP. These entities are called as *AP Bridge* host controllers (APB).
+They can receive messages over USB and send them over UniPro and vice versa. The
+AP isn't part of the Greybus network really and so isn't represented in the
+above picture. The APB is connected to one side of the USB subsystem and the AP
+is on the other side of it. The Greybus subsystem also supports SoCs with
+inbuilt UniPro support and they are represented by *Native UniPro* host
+controller. The AP can talk directly to them without the USB subsystem.
+
+During module initialization (after the module is detected on Greybus), the
+Greybus Core parses Module's *Manifest* (which describe the capabilities of the
+modules, described later) and create devices within the kernel to represent
+them.
+
+*Power management* of the whole UniPro network (i.e. AP, SVC and modules) is
+managed by the Greybus Core. During system suspend, the Greybus core puts the
+SVC and the modules to low power states and on system resume it brings up the
+Greybus network. The Greybus Core also performs runtime PM for all individual
+entities. For example, if a module isn't getting used currently, the Greybus
+core will power it off and will bring it back only when it is required to be
+used.
+
+The Greybus Core also binds itself to the Linux kernel driver core and provides
+a sysfs interface at /sys/bus/greybus. Following depicts the sysfs hierarchy for
+a single AP Bridge (APB) connected to the AP. A single Module is accessible via
+the APB and the module presents a single Interface which contains two Bundles
+within it. The figure also represents the Control CPort per interface and the
+SVC per APB, along with a list of attributes for each entity. All of these
+entities will be described later in detail.
+
+----
+
+greybus/
+└── greybus1 (AP Bridge)
+ ├── 1-2 (Module)
+ │   ├── 1-2.2 (Interface)
+ │   │   ├── 1-2.2.1 (Bundle)
+ │   │   │   ├── bundle_class
+ │   │   │   ├── bundle_id
+ │   │   │   └── state
+ │   │   ├── 1-2.2.2 (Bundle)
+ │   │   │   ├── bundle_class
+ │   │   │   ├── bundle_id
+ │   │   │   └── state
+ │   │   ├── 1-2.2.ctrl (Control CPort)
+ │   │   │   ├── product_string
+ │   │   │   └── vendor_string
+ │   │   ├── ddbl1_manufacturer_id
+ │   │   ├── ddbl1_product_id
+ │   │   ├── interface_id
+ │   │   ├── product_id
+ │   │   ├── serial_number
+ │   │   └── vendor_id
+ │   ├── eject
+ │   ├── module_id
+ │   └── num_interfaces
+ ├── 1-svc (SVC)
+ │   ├── ap_intf_id
+ │   ├── endo_id
+ │   └── intf_eject
+ └── bus_id
+
+7 directories, 21 files
+
+----
+
+The functionality provided by the modules is exposed using Device class and
+Bridged PHY drivers. The *Device class drivers* implement protocols whose
+purpose is to provide a device abstraction for the functionality commonly found on
+the mobile handsets. For example camera, battery, sensors, etc. The *Bridged PHY
+drivers* implement protocols whose purpose is to support communication with the
+modules on the Greybus network which do not comply with an Device class
+protocol, and which include integrated circuits using alternative physical
+interfaces to UniPro. For example GPIO, I2C, SPI, USB, etc. The modules which
+only implement Device class protocols are said to be device class conformant.
+Modules which implement any of the Bridged PHY protocols are said to be
+non-device class conformant. The Device class and Bridged PHY protocols will be
+listed later.
+
+Module Hierarchy
+~~~~~~~~~~~~~~~~
+
+A *Module* is the physical hardware entity that can be connected or disconnected
+statically (before powering ON the system) or dynamically (while the system is
+running) from the Greybus network. Once the modules are connected to the Greybus
+network, the AP and the SVC enumerate the modules and fetch per interface
+manifest to learn about their capabilities.
+
+The following figure gives a glimpse of how the module hierarchy looks like in
+the Greybus subsystem.
+
+image::./modules.png[title="Module hierarchy",height=500,width=800,align="center"]
+
+Modules are represented within the Linux kernel by the *struct gb_module*.
+
+----
+struct gb_module {
+ struct device dev;
+ u8 module_id;
+ size_t num_interfaces;
+ struct gb_interface *interfaces[0];
+
+ ...
+};
+----
+
+Here, *dev* is the module's device structure, *module_id* is a unique 8 bit
+number assigned to the module by the SVC, *interfaces* are the Interfaces
+present within the module and *num_interfaces* is their count.
+
+
+The Greybus modules have electrical connectors on them, which connect them to
+the phone's frame. These electrical connectors are called as Interface Blocks
+and are represented in software by the term *Interface*. A module can have one
+or more Interfaces. The Interface with a smaller interface ID is configured as
+the primary interface and all other are called as secondary interfaces. The
+*module_id* is set to the ID of the primary interface. The primary interface is
+special as the AP receives module insertion event with the ID of the primary
+interface and the module can be ejected from the frame only using the primary
+interface. The interfaces can present any number of functionalities (like:
+camera, audio, battery, etc), which can be supported with the bandwidth
+available to the respective Interface block. The interfaces are represented
+within the Linux kernel by the *struct gb_interface*.
+
+----
+struct gb_interface {
+ struct device dev;
+ struct gb_control *control;
+
+ struct list_head bundles;
+ struct list_head manifest_descs;
+ u8 interface_id;
+ struct gb_module *module;
+
+ ...
+};
+----
+
+Here, *dev* is the interface's device structure, *control* represents the
+Control connection (described later), *bundles* is the list containing bundles
+within the interface, *manifest_descs* is the lists of descriptors created from
+the interface manifest, *interface_id* is the unique ID of the interface, and
+*module* is the pointer to the parent module structure. Both module ID and
+interface ID start from 0 and are unique within the Greybus network.
+
+
+The Greybus Interfaces can contain one or more *bundles*. Each bundle represents
+a logical greybus device in the kernel. For example, an Interface with vibrator
+and battery functionalities will have two bundles. One bundle for the vibrator
+and one bundle for the battery functionality. Each bundle will get a *struct
+device* for itself and a greybus driver will bind to that device. The bundle ID
+is unique within an Interface and it starts from 1; 0 is reserved. The bundles
+are represented within the Linux kernel by the *struct gb_bundle*.
+
+----
+struct gb_bundle {
+ struct device dev;
+ struct gb_interface *intf;
+
+ u8 id;
+ u8 class;
+ size_t num_cports;
+ struct list_head connections;
+
+ ...
+};
+----
+
+Here, *dev* is the bundle's device structure, *intf* is the pointer to the
+parent interface, *id* is the unique ID of the bundle within the interface,
+*class* is the Class type of the bundle (like, camera or audio), *connections*
+is the connections within the bundle and *num_cports* is the count of the
+connections.
+
+
+The Greybus driver is represented by the following structure and it accepts the
+bundle structure as argument to all its callbacks.
+
+----
+struct greybus_driver {
+ const char *name;
+
+ int (*probe)(struct gb_bundle *bundle,
+ const struct greybus_bundle_id *id);
+ void (*disconnect)(struct gb_bundle *bundle);
+
+ const struct greybus_bundle_id *id_table;
+
+ struct device_driver driver;
+};
+----
+
+Here, *name* is the name of the Greybus driver, *probe* and *disconnect* are the
+callbacks, *id_table* is the device bundle id table and *driver* is the generic
+device driver structure.
+
+
+The Greybus or UniPro *connection* is a bidirectional communication path between
+two CPorts. There can be one or more CPorts within a bundle. The communication
+over the connections is governed by a predefined set of operations and the
+semantics of those operations is defined by Greybus Protocols (covered later).
+Each CPort is managed by exactly one protocol. The CPort numbers are unique
+within an interface. The first CPort within the interface is always the control
+CPort0 (not part of any bundle) and rest of the CPorts are numbered starting 1
+and should be part of a bundle. CPort0 of an interface is special and is used
+for the management of the Interface. CPort0 of an interface is governed by a
+special protocol, Control Protocol (described later). The connections are
+represented within Linux kernel by the *struct gb_connection*.
+
+----
+struct gb_connection {
+ struct gb_host_device *hd;
+ struct gb_interface *intf;
+ struct gb_bundle *bundle;
+ u16 hd_cport_id;
+ u16 intf_cport_id;
+ struct list_head operations;
+
+ ...
+};
+----
+
+Here, *hd* represents the AP bridge through which the AP is communicating to the
+module, *intf* represents the parent interface, *bundle* represents the parent
+bundle, *hd_cport_id* represents the CPort ID of the AP bridge, *intf_cport_id*
+represents the CPort ID of the interface, and *operations* is the list of
+operations which are getting exchanged over the connection. The connection is
+established between hd_cport_id and intf_cport_id.
+
+The Greybus bundles can also represent complex functionalities, like audio or
+camera. Normally such complex devices consist of multiple components working
+together, like sensors, DMA, bridge, DAIs, codecs, etc., and a single bundle
+device may look insufficient to represent them. But that's how Greybus
+represents such devices. The module side contains the real firmware which makes
+all these components work together and the module firmware takes inputs from the
+AP over the connections present within the bundle. For example, a bundle
+representing the camera will have two connections: data and management. All
+management instructions are sent to the module or configurations are received
+from the module using the management connection. And the data from the camera on
+the module is received over the data connection. The internals of how various
+components work together to represent the camera are hidden from Greybus and
+hence the AP.
+
+
+When a module and its interfaces are connected to the Greybus network (by
+attaching the module to the frame of the phone), the AP starts enumerating its
+interfaces over the CPort0. The AP fetches a block of data from the interfaces,
+called as the *Interface Manifest*. The manifest is a data structure, which
+contains manifest header along with a set of descriptors. The manifest allows
+the AP to learn about the capabilities of the interface.
+
+Following is a simple example of a raw manifest file, that represents an
+interface which supports a single *Audio* bundle. The manifest file is then
+converted into a binary blob using the
+link:https://github.com/projectara/manifesto[Manifesto] library. In the
+following example, the bundle has two connection: Management and Data. Note that
+it is optional to add the control CPort0 in the manifest file.
+
+----
+; Simple Audio Interface Manifest
+;
+; Provided under the three clause BSD license found in the LICENSE file.
+
+[manifest-header]
+version-major = 0
+version-minor = 1
+
+[interface-descriptor]
+vendor-string-id = 1
+product-string-id = 2
+
+; Interface vendor string
+[string-descriptor 1]
+string = Project Ara
+
+; Interface product string
+[string-descriptor 2]
+string = Simple Audio Interface
+
+; Bundle 1: Audio class
+[bundle-descriptor 1]
+class = 0x12
+
+; Audio Management protocol on CPort 1
+[cport-descriptor 1]
+bundle = 1
+protocol = 0x12
+
+; Audio Data protocol on CPort 2
+[cport-descriptor 2]
+bundle = 1
+protocol = 0x13
+----
+
+
+Greybus Messages
+~~~~~~~~~~~~~~~~
+
+Greybus communication is built on UniPro messages, which are used to exchange
+information between the AP, SVC, and the modules. Normally all communication is
+bidirectional, i.e. For every request message from sender, the receiver responds
+with a response message. Which entity (AP, SVC or the Module) can initiate a
+request message depends on the individual operation as defined by the respective
+protocol. For example, only the AP can initiate operations on the Control
+protocol. Some of the operations are Unidirectional operations as well, where
+the receiver doesn't need to respond back with a response message.
+
+Each message sent over UniPro begins with a short header, and is followed by
+an operation specific payload data. The message header is represented by
+following structure:
+
+----
+struct gb_operation_msg_hdr {
+ __le16 size; /* Size in bytes of header + payload */
+ __le16 operation_id; /* Operation unique id */
+ __u8 type; /* E.g GB_I2C_TYPE_TRANSFER */
+ __u8 result; /* Result of request (in responses only) */
+ __u8 pad[2]; /* must be zero (ignore when read) */
+} __packed;
+----
+
+Here, *size* is the size of the header (8 bytes) plus size of the payload data.
+The size of payload data is defined by each operation of every protocol.
+
+The *operation_id* is a unique 16 bit number which is used to match request and
+response messages. The operation_id allows many operations to be "in flight" on
+a connection at once. The special ID 0 is reserved for unidirectional
+operations.
+
+The operation *type* is an 8 bit number that defines the type of the operation.
+The meaning of the type value depends on the protocol in use on the connection
+carrying the message. Only 127 operations are available for a given Protocol
+0x01..0x7f; Operation 0x00 is reserved. The most significant bit (0x80) of an
+operation type is used as a flag that distinguishes a request message from its
+response. For requests, this bit is 0, for responses, it is 1.
+
+The *result* field is ignored for the request messages and it contains the
+result of the requested operation in the response message.
+
+The Greybus messages (both request and response) are managed using the following
+structure within the Linux kernel:
+
+----
+struct gb_message {
+ struct gb_operation *operation;
+ struct gb_operation_msg_hdr *header;
+
+ void *payload;
+ size_t payload_size;
+
+ ...
+};
+----
+
+Here, *operation* is the operation to which the message belongs, *header* is the
+header to be sent over unipro, *payload* is the payload to be sent following the
+header and *payload_size* is the size of the payload.
+
+The entire Greybus Operation (request + response) is managed using the following
+structure within the Linux kernel:
+
+----
+struct gb_operation {
+ struct gb_connection *connection;
+ struct gb_message *request;
+ struct gb_message *response;
+
+ u8 type;
+ u16 id;
+
+ ...
+};
+----
+
+Here, *connection* represents the communication path over which Unipro messages
+are sent, *request* and *response* represent the Greybus messages, and *type*
+and *id* are as described earlier in the message header.
+
+There are multiple helpers which can be used to send/receive Greybus messages
+over a connection, but most of the users end up using following:
+
+----
+int gb_operation_sync_timeout(struct gb_connection *connection, int type,
+ void *request, int request_size, void *response,
+ int response_size, unsigned int timeout);
+----
+
+Here, *connection* represents the communication path over which Unipro messages
+are sent, *type* is the operation type, *request* is the request payload,
+*request_size* is the size of the request payload, *response* is the space for
+response payload, *response_size* is the size of the expected response payload,
+and *timeout* is the timeout period for the operation in milliseconds. Mostly a
+timeout of 1 millisecond is chosen.
+
+gb_operation_sync_timeout() first creates the operation and its messages, copies
+the request payload into the request message, and then sends the request message
+header and its payload over the Greybus connection. It then waits for "timeout"
+milliseconds for a response from the other side and errors out if no response is
+received within that time. Otherwise, once the response is received, it first
+checks the response header to check the result of the operation. If result field
+indicates an error, then gb_operation_sync_timeout() errors out. Otherwise it
+copies the response payload in the memory pointed by the "response" field and
+then destroys the operation and message structures. It returns 0 on success or a
+negative number to represent errors.
+
+
+Greybus Protocols
+~~~~~~~~~~~~~~~~~
+
+The Greybus Protocols define the layout and semantics of the Greybus messages,
+which may be exchanged over a Greybus connection. Each Greybus protocol defines
+a set of operations, with formats of their request and response messages. It
+also states, which side of the connection can initiate the request, i.e. The AP,
+the module or the SVC. The Greybus protocols are broadly divided into three
+categories: Special protocols, Device class protocols and Bridged PHY protocols.
+
+The *Special* protocols are the core Greybus protocols with administrating
+powers over the Greybus network. There are two special protocols: SVC and
+Control.
+
+The *SVC* protocol serves the purpose of communication between the AP and the
+SVC. The AP controls the network via the SVC using this protocol. The CPort0 of
+the APB is used for the SVC connection (Don't confuse that with the CPort0 of
+each module interface which is used for control protocol). The main purpose of
+this protocol is to help the AP create routes between various CPorts, sense
+module insertion or removal, etc. The modules on the Greybus network should not
+implement it. Some of the operations allowed under this protocol are: Module
+inserted/removed events, Create/destroy route, Create/destroy connection,
+Interface eject, Interface activate, Interface resume, etc.
+
+The *Control* protocol serves the purpose of communication between the AP and
+the module's interfaces. The AP controls individual interfaces using this
+protocol. The main purpose of this protocol is to help the AP enumerate a new
+interface and learn about its capabilities. Only the AP can initiate operations
+(send requests) under this protocol and the module needs to responds to those
+requests. Some of the operations allowed under this protocol are: Get interface
+manifest, Bundle suspend/resume/activate/deactivate, Get bundle version, etc.
+
+
+As mentioned earlier, the *Device Class* protocols provide a device abstraction
+for the functionality commonly found on mobile handsets. A simple example of
+that can be the audio management protocol or the camera management protocol.
+Following are various types of such protocols implemented in the Linux kernel
+Greybus subsystem: Audio Management Protocol, Camera Management Protocol,
+Firmware Management Protocol, Firmware Download Protocol, Component
+Authentication Protocol, HID Protocol, Lights Protocol, Log Protocol, Power
+Supply Protocol, Loopback Protocol, Raw Protocol, and Vibrator Protocol.
+
+The *Bridged PHY* protocols provide communication with the modules on the
+Greybus network which do not comply with an existing device class Protocol, and
+which include integrated circuits using alternative physical interfaces to
+UniPro. Following are various types of such protocols implemented in the Linux
+kernel Greybus subsystem: USB Protocol, GPIO Protocol, SPI Protocol, SDIO
+Protocol, UART Protocol, PWM Protocol, and I2C Protocol.
+
+The individual protocols aren't described in great detail here to keep this
+article short.
+
+What's next
+~~~~~~~~~~~
+
+It would be interesting to do couple of things going forward. To begin with, the
+Greybus core should be moved from the staging tree to the kernel's drivers
+directory. And the bundle drivers should be moved to their respective framework
+directories; like drivers/staging/greybus/gpio.c becomes
+drivers/gpio/gpio-greybus.c. This would take reasonable amount of time and
+effort.
+
+Later on, it would be nice to get Motorola's Moto Mods support merged into the
+kernel and include its improvement in the Greybus subsystem. That will mostly
+depend on the Motorola community to contribute though. As the Project ARA is
+discontinued now, it would be quite interesting to find new targets (like
+Internet of things) for the Greybus subsystem and adapt Greybus to support them.
+Some discussions are going on around that.
diff --git a/greybus/modules.png b/greybus/modules.png
new file mode 100644
index 0000000..ca17f4f
--- /dev/null
+++ b/greybus/modules.png
Binary files differ
diff --git a/greybus/my_raw.txt b/greybus/my_raw.txt
new file mode 100644
index 0000000..d8b6870
--- /dev/null
+++ b/greybus/my_raw.txt
@@ -0,0 +1,559 @@
+The Greybus Subsystem
+=====================
+
+The Linux kernel gained a new subsystem during the 4.9 release, *Greybus* and
+this article will briefly take you through the internals of it.
+
+Greybus was initially designed for Google's Project ARA smartphone (which is
+discontinued now), but the first (and only) product released with it is
+Motorola's Moto Mods. There are also discussions going on to evaluate the
+feasibility of using the robust protocols provided by Greybus in non-Unipro
+applications like IoT, and other parts of the kernel that need to communicate in
+a platform independent way.
+
+Initially, Greg Kroah-Hartman tried to merge Greybus core in the drivers
+directory directly (drivers/greybus), but after some objection (people wanted to
+do more detailed reviews before merging it) everyone agreed to merge it in the
+staging tree (drivers/staging/greybus). Almost *2400* patches, developed over
+2.5 years, got merged with contributions from 50+ developers from at least 5
+organizations (Google, Linaro, BayLibre, LeafLabs, MMSolutions). There were a
+lot more developers and companies involved for developing other parts of the ARA
+software and hardware. Greybus developers also showed in the list of most
+active developers (Top four by changesets) for the 4.9 release.
+
+Greg made sure Greybus gets merged with all its history preserved and he said:
+
+ "Because this was 2 1/2 years of work, with many many developers
+ contributing, I didn't want to flatten all of their effort into a few
+ small patches, as that wouldn't be very fair. So I've built a git tree
+ with all of the changes going back to the first commit, and merged it
+ into the kernel tree, just like btrfs was merged into the kernel."
+
+Jonathan Corbet wrote an article earlier on Greybus. The readers may want to
+look at that to catchup on some history.
+
+
+UniPro And The Internals of the Greybus Subsystem
+-------------------------------------------------
+
+The Project ARA smartphone was designed to be customizable. The user can select
+a subset of physical modules from a wide range of modules, providing interesting
+capabilities (like cameras, speakers, batteries, displays, sensors, etc), and
+attach them to the frame of the phone. The modules can communicate with the main
+processors or other modules directly over the UniPro bus. The specifications of
+the UniPro bus are defined by the MIPI alliance. UniPro follows the architecture
+of the classical OSI network model, except that it has no application layer
+defined. And that's where Greybus fits in. Project Ara developers defined their
+own application layer for Unipro and that is known as *Greybus*.
+
+UniPro communication happens over bidirectional connections between entities,
+like the Modules on the ARA smartphone; it doesn't need to go through the
+processors. Each UniPro device has virtual ports within it, which can be seen
+as sub-addresses within the device. They are a lot like sockets and are called
+as *Connection Ports* (CPort). There is a switch on the bus that sets up the
+actual routes. Messages can pass at a rate of around 10Gb/s; the bus also has
+message prioritization, error handling, and notification of delivery problems,
+thought Unipro doesn't support streams or multicast delivery.
+
+As the Greybus specification was initially written for the Project ARA
+smartphone, it is greatly inspired by ARA's design, where modules can be
+inserted into or removed from the phone's frame on the fly. A lot of efforts
+have been put to make the specification as generic as possible, in order to make
+it fit for other use cases as well. You will also notice a lot of similarities
+with the USB Bus framework in Linux kernel, as it was taken as a reference
+during the development of Greybus.
+
+Greybus specification provides:
+
+* Device discovery and description at runtime.
+* Network routing and housekeeping.
+* Class and Bridged PHY protocols, which devices use to talk to each other, and
+ the processors.
+
+The following figure gives a glimpse of how various parts of the kernel interact
+with the Greybus Subsystem.
+
+image::./greybus.png[title="The Greybus Subsystem",height=500,width=800,align="center"]
+
+The *Greybus Core* at the center of the figure is the soul of the Greybus
+subsystem.
+
+The *Application Processor* (AP) represents the group of processors which run
+the host operating system, Linux in our case.
+
+The Greybus Core implements the SVC protocol (described later), which is used by
+the Application Processor (AP) to communicate to the *Supervisory Controller*
+(SVC). The SVC represents an entity within the Greybus network which configures
+and controls the Greybus (UniPro) network, mostly based on the instructions from
+the AP. All module insertion and removal events are first reported to the SVC,
+which in turn informs the AP about them using the SVC protocol. The AP is
+responsible for administrating the Greybus network via the SVC.
+
+During initial development of the ARA smartphone, there were no SoCs available
+with inbuilt UniPro support. Separate hardware entities were designed to connect
+the AP to the Unipro network. These entities receive a message from AP
+(non-UniPro), translate and send that to the UniPro network. The same was also
+required in the other direction, i.e. Receive messages from UniPro and translate
+them to the AP. These entities are called as *AP Bridge* host controllers (APB).
+They can receive messages over USB and send them over UniPro and vice versa. The
+AP isn't part of the Greybus network really and so isn't represented in the
+above picture. The APB is connected to one side of the USB subsystem and the AP
+is on the other side of it. The Greybus subsystem also supports SoCs with
+inbuilt UniPro support and they are represented by *Native UniPro* host
+controller. The AP can talk directly to them without the USB subsystem.
+
+During module initialization (after the module is detected on Greybus), the
+Greybus Core parses Module's *Manifest* (which describe the capabilities of the
+modules, described later) and create devices within the kernel to represent
+them.
+
+*Power management* of the whole UniPro network (i.e. AP, SVC and modules) is
+managed by the Greybus Core. During system suspend, the Greybus core puts the
+SVC and the modules to low power states and on system resume it brings up the
+Greybus network. The Greybus Core also performs runtime PM for all individual
+entities. For example, if a module isn't getting used currently, the Greybus
+core will power it off and will bring it back only when it is required to be
+used.
+
+The Greybus Core also binds itself to the Linux kernel driver core and provides
+a sysfs interface at /sys/bus/greybus. Following depicts the sysfs hierarchy for
+a single AP Bridge (APB) connected to the AP. A single Module is accessible via
+the APB and the module presents a single Interface which contains two Bundles
+within it. The figure also represents the Control CPort per interface and the
+SVC per APB, along with a list of attributes for each entity. All of these
+entities will be described later in detail.
+
+----
+
+greybus/
+└── greybus1 (AP Bridge)
+ ├── 1-2 (Module)
+ │   ├── 1-2.2 (Interface)
+ │   │   ├── 1-2.2.1 (Bundle)
+ │   │   │   ├── bundle_class
+ │   │   │   ├── bundle_id
+ │   │   │   └── state
+ │   │   ├── 1-2.2.2 (Bundle)
+ │   │   │   ├── bundle_class
+ │   │   │   ├── bundle_id
+ │   │   │   └── state
+ │   │   ├── 1-2.2.ctrl (Control CPort)
+ │   │   │   ├── product_string
+ │   │   │   └── vendor_string
+ │   │   ├── ddbl1_manufacturer_id
+ │   │   ├── ddbl1_product_id
+ │   │   ├── interface_id
+ │   │   ├── product_id
+ │   │   ├── serial_number
+ │   │   └── vendor_id
+ │   ├── eject
+ │   ├── module_id
+ │   └── num_interfaces
+ ├── 1-svc (SVC)
+ │   ├── ap_intf_id
+ │   ├── endo_id
+ │   └── intf_eject
+ └── bus_id
+
+7 directories, 21 files
+
+----
+
+The functionality provided by the modules is exposed using Device class and
+Bridged PHY drivers. The *Device class drivers* implement protocols whose
+purpose is to provide a device abstraction for the functionality commonly found on
+the mobile handsets. For example camera, battery, sensors, etc. The *Bridged PHY
+drivers* implement protocols whose purpose is to support communication with the
+modules on the Greybus network which do not comply with an Device class
+protocol, and which include integrated circuits using alternative physical
+interfaces to UniPro. For example GPIO, I2C, SPI, USB, etc. The modules which
+only implement Device class protocols are said to be device class conformant.
+Modules which implement any of the Bridged PHY protocols are said to be
+non-device class conformant. The Device class and Bridged PHY protocols will be
+listed later.
+
+Module Hierarchy
+~~~~~~~~~~~~~~~~
+
+A *Module* is the physical hardware entity that can be connected or disconnected
+statically (before powering ON the system) or dynamically (while the system is
+running) from the Greybus network. Once the modules are connected to the Greybus
+network, the AP and the SVC enumerate the modules and fetch per interface
+manifest to learn about their capabilities.
+
+The following figure gives a glimpse of how the module hierarchy looks like in
+the Greybus subsystem.
+
+image::./modules.png[title="Module hierarchy",height=500,width=800,align="center"]
+
+Modules are represented within the Linux kernel by the *struct gb_module*.
+
+----
+struct gb_module {
+ struct device dev;
+ u8 module_id;
+ size_t num_interfaces;
+ struct gb_interface *interfaces[0];
+
+ ...
+};
+----
+
+Here, *dev* is the module's device structure, *module_id* is a unique 8 bit
+number assigned to the module by the SVC, *interfaces* are the Interfaces
+present within the module and *num_interfaces* is their count.
+
+
+The Greybus modules have electrical connectors on them, which connect them to
+the phone's frame. These electrical connectors are called as Interface Blocks
+and are represented in software by the term *Interface*. A module can have one
+or more Interfaces. The Interface with a smaller interface ID is configured as
+the primary interface and all other are called as secondary interfaces. The
+*module_id* is set to the ID of the primary interface.
+
+The primary interface is
+special as the AP receives module insertion event with the ID of the primary
+interface and the module can be ejected from the frame only using the primary
+interface. The interfaces can present any number of functionalities (like:
+camera, audio, battery, etc), which can be supported with the bandwidth
+available to the respective Interface block. The interfaces are represented
+within the Linux kernel by the *struct gb_interface*.
+
+----
+struct gb_interface {
+ struct device dev;
+ struct gb_control *control;
+
+ struct list_head bundles;
+ struct list_head manifest_descs;
+ u8 interface_id;
+ struct gb_module *module;
+
+ ...
+};
+----
+
+Here, *dev* is the interface's device structure, *control* represents the
+Control connection (described later), *bundles* is the list containing bundles
+within the interface, *manifest_descs* is the lists of descriptors created from
+the interface manifest, *interface_id* is the unique ID of the interface, and
+*module* is the pointer to the parent module structure. Both module ID and
+interface ID start from 0 and are unique within the Greybus network.
+
+
+The Greybus Interfaces can contain one or more *bundles*. Each bundle represents
+a logical greybus device in the kernel. For example, an Interface with vibrator
+and battery functionalities will have two bundles. One bundle for the vibrator
+and one bundle for the battery functionality. Each bundle will get a *struct
+device* for itself and a greybus driver will bind to that device. The bundle ID
+is unique within an Interface and it starts from 1; 0 is reserved. The bundles
+are represented within the Linux kernel by the *struct gb_bundle*.
+
+----
+struct gb_bundle {
+ struct device dev;
+ struct gb_interface *intf;
+
+ u8 id;
+ u8 class;
+ size_t num_cports;
+ struct list_head connections;
+
+ ...
+};
+----
+
+Here, *dev* is the bundle's device structure, *intf* is the pointer to the
+parent interface, *id* is the unique ID of the bundle within the interface,
+*class* is the Class type of the bundle (like, camera or audio), *connections*
+is the connections within the bundle and *num_cports* is the count of the
+connections.
+
+
+The Greybus driver is represented by the following structure and it accepts the
+bundle structure as argument to all its callbacks.
+
+----
+struct greybus_driver {
+ const char *name;
+
+ int (*probe)(struct gb_bundle *bundle,
+ const struct greybus_bundle_id *id);
+ void (*disconnect)(struct gb_bundle *bundle);
+
+ const struct greybus_bundle_id *id_table;
+
+ struct device_driver driver;
+};
+----
+
+Here, *name* is the name of the Greybus driver, *probe* and *disconnect* are the
+callbacks, *id_table* is the device bundle id table and *driver* is the generic
+device driver structure.
+
+
+The Greybus or UniPro *connection* is a bidirectional communication path between
+two CPorts. There can be one or more CPorts within a bundle. The communication
+over the connections is governed by a predefined set of operations and the
+semantics of those operations is defined by Greybus Protocols (covered later).
+Each CPort is managed by exactly one protocol. The CPort numbers are unique
+within an interface. The first CPort within the interface is always the control
+CPort0 (not part of any bundle) and rest of the CPorts are numbered starting 1
+and should be part of a bundle. CPort0 of an interface is special and is used
+for the management of the Interface. CPort0 of an interface is governed by a
+special protocol, Control Protocol (described later). The connections are
+represented within Linux kernel by the *struct gb_connection*.
+
+----
+struct gb_connection {
+ struct gb_host_device *hd;
+ struct gb_interface *intf;
+ struct gb_bundle *bundle;
+ u16 hd_cport_id;
+ u16 intf_cport_id;
+ struct list_head operations;
+
+ ...
+};
+----
+
+Here, *hd* represents the AP bridge through which the AP is communicating to the
+module, *intf* represents the parent interface, *bundle* represents the parent
+bundle, *hd_cport_id* represents the CPort ID of the AP bridge, *intf_cport_id*
+represents the CPort ID of the interface, and *operations* is the list of
+operations which are getting exchanged over the connection. The connection is
+established between hd_cport_id and intf_cport_id.
+
+The Greybus bundles can also represent complex functionalities, like audio or
+camera. Normally such complex devices consist of multiple components working
+together, like sensors, DMA, bridge, DAIs, codecs, etc., and a single bundle
+device may look insufficient to represent them. But that's how Greybus
+represents such devices. The module side contains the real firmware which makes
+all these components work together and the module firmware takes inputs from the
+AP over the connections present within the bundle. For example, a bundle
+representing the camera will have two connections: data and management. All
+management instructions are sent to the module or configurations are received
+from the module using the management connection. And the data from the camera on
+the module is received over the data connection. The internals of how various
+components work together to represent the camera are hidden from Greybus and
+hence the AP.
+
+
+When a module and its interfaces are connected to the Greybus network (by
+attaching the module to the frame of the phone), the AP starts enumerating its
+interfaces over the CPort0. The AP fetches a block of data from the interfaces,
+called as the *Interface Manifest*. The manifest is a data structure, which
+contains manifest header along with a set of descriptors. The manifest allows
+the AP to learn about the capabilities of the interface.
+
+Following is a simple example of a raw manifest file, that represents an
+interface which supports a single *Audio* bundle. The manifest file is then
+converted into a binary blob using the
+Manifesto library. In the
+following example, the bundle has two connection: Management and Data. Note that
+it is optional to add the control CPort0 in the manifest file.
+
+----
+; Simple Audio Interface Manifest
+;
+; Provided under the three clause BSD license found in the LICENSE file.
+
+[manifest-header]
+version-major = 0
+version-minor = 1
+
+[interface-descriptor]
+vendor-string-id = 1
+product-string-id = 2
+
+; Interface vendor string
+[string-descriptor 1]
+string = Project Ara
+
+; Interface product string
+[string-descriptor 2]
+string = Simple Audio Interface
+
+; Bundle 1: Audio class
+[bundle-descriptor 1]
+class = 0x12
+
+; Audio Management protocol on CPort 1
+[cport-descriptor 1]
+bundle = 1
+protocol = 0x12
+
+; Audio Data protocol on CPort 2
+[cport-descriptor 2]
+bundle = 1
+protocol = 0x13
+----
+
+
+Greybus Messages
+~~~~~~~~~~~~~~~~
+
+Greybus communication is built on UniPro messages, which are used to exchange
+information between the AP, SVC, and the modules. Normally all communication is
+bidirectional, i.e. For every request message from sender, the receiver responds
+with a response message. Which entity (AP, SVC or the Module) can initiate a
+request message depends on the individual operation as defined by the respective
+protocol. For example, only the AP can initiate operations on the Control
+protocol. Some of the operations are Unidirectional operations as well, where
+the receiver doesn't need to respond back with a response message.
+
+Each message sent over UniPro begins with a short header, and is followed by
+an operation specific payload data. The message header is represented by
+following structure:
+
+----
+struct gb_operation_msg_hdr {
+ __le16 size; /* Size in bytes of header + payload */
+ __le16 operation_id; /* Operation unique id */
+ __u8 type; /* E.g GB_I2C_TYPE_TRANSFER */
+ __u8 result; /* Result of request (in responses only) */
+ __u8 pad[2]; /* must be zero (ignore when read) */
+} __packed;
+----
+
+Here, *size* is the size of the header (8 bytes) plus size of the payload data.
+The size of payload data is defined by each operation of every protocol.
+
+The *operation_id* is a unique 16 bit number which is used to match request and
+response messages. The operation_id allows many operations to be "in flight" on
+a connection at once. The special ID 0 is reserved for unidirectional
+operations.
+
+The operation *type* is an 8 bit number that defines the type of the operation.
+The meaning of the type value depends on the protocol in use on the connection
+carrying the message. Only 127 operations are available for a given Protocol
+0x01..0x7f; Operation 0x00 is reserved. The most significant bit (0x80) of an
+operation type is used as a flag that distinguishes a request message from its
+response. For requests, this bit is 0, for responses, it is 1.
+
+The *result* field is ignored for the request messages and it contains the
+result of the requested operation in the response message.
+
+The Greybus messages (both request and response) are managed using the following
+structure within the Linux kernel:
+
+----
+struct gb_message {
+ struct gb_operation *operation;
+ struct gb_operation_msg_hdr *header;
+
+ void *payload;
+ size_t payload_size;
+
+ ...
+};
+----
+
+Here, *operation* is the operation to which the message belongs, *header* is the
+header to be sent over unipro, *payload* is the payload to be sent following the
+header and *payload_size* is the size of the payload.
+
+The entire Greybus Operation (request + response) is managed using the following
+structure within the Linux kernel:
+
+----
+struct gb_operation {
+ struct gb_connection *connection;
+ struct gb_message *request;
+ struct gb_message *response;
+
+ u8 type;
+ u16 id;
+
+ ...
+};
+----
+
+Here, *connection* represents the communication path over which Unipro messages
+are sent, *request* and *response* represent the Greybus messages, and *type*
+and *id* are as described earlier in the message header.
+
+There are multiple helpers which can be used to send/receive Greybus messages
+over a connection, but most of the users end up using following:
+
+----
+int gb_operation_sync_timeout(struct gb_connection *connection, int type,
+ void *request, int request_size, void *response,
+ int response_size, unsigned int timeout);
+----
+
+Here, *connection* represents the communication path over which Unipro messages
+are sent, *type* is the operation type, *request* is the request payload,
+*request_size* is the size of the request payload, *response* is the space for
+response payload, *response_size* is the size of the expected response payload,
+and *timeout* is the timeout period for the operation in milliseconds. Mostly a
+timeout of 1 millisecond is chosen.
+
+gb_operation_sync_timeout() first creates the operation and its messages, copies
+the request payload into the request message, and then sends the request message
+header and its payload over the Greybus connection. It then waits for "timeout"
+milliseconds for a response from the other side and errors out if no response is
+received within that time. Otherwise, once the response is received, it first
+checks the response header to check the result of the operation. If result field
+indicates an error, then gb_operation_sync_timeout() errors out. Otherwise it
+copies the response payload in the memory pointed by the "response" field and
+then destroys the operation and message structures. It returns 0 on success or a
+negative number to represent errors.
+
+
+Greybus Protocols
+~~~~~~~~~~~~~~~~~
+
+The Greybus Protocols define the layout and semantics of the Greybus messages,
+which may be exchanged over a Greybus connection. Each Greybus protocol defines
+a set of operations, with formats of their request and response messages. It
+also states, which side of the connection can initiate the request, i.e. The AP,
+the module or the SVC. The Greybus protocols are broadly divided into three
+categories: Special protocols, Device class protocols and Bridged PHY protocols.
+
+The *Special* protocols are the core Greybus protocols with administrating
+powers over the Greybus network. There are two special protocols: SVC and
+Control.
+
+The *SVC* protocol serves the purpose of communication between the AP and the
+SVC. The AP controls the network via the SVC using this protocol. The CPort0 of
+the APB is used for the SVC connection (Don't confuse that with the CPort0 of
+each module interface which is used for control protocol). The main purpose of
+this protocol is to help the AP create routes between various CPorts, sense
+module insertion or removal, etc. The modules on the Greybus network should not
+implement it. Some of the operations allowed under this protocol are: Module
+inserted/removed events, Create/destroy route, Create/destroy connection,
+Interface eject, Interface activate, Interface resume, etc.
+
+The *Control* protocol serves the purpose of communication between the AP and
+the module's interfaces. The AP controls individual interfaces using this
+protocol. The main purpose of this protocol is to help the AP enumerate a new
+interface and learn about its capabilities. Only the AP can initiate operations
+(send requests) under this protocol and the module needs to responds to those
+requests. Some of the operations allowed under this protocol are: Get interface
+manifest, Bundle suspend/resume/activate/deactivate, Get bundle version, etc.
+
+
+As mentioned earlier, the *Device Class* protocols provide a device abstraction
+for the functionality commonly found on mobile handsets. A simple example of
+that can be the audio management protocol or the camera management protocol.
+Following are various types of such protocols implemented in the Linux kernel
+Greybus subsystem: Audio Management Protocol, Camera Management Protocol,
+Firmware Management Protocol, Firmware Download Protocol, Component
+Authentication Protocol, HID Protocol, Lights Protocol, Log Protocol, Power
+Supply Protocol, Loopback Protocol, Raw Protocol, and Vibrator Protocol.
+
+The *Bridged PHY* protocols provide communication with the modules on the
+Greybus network which do not comply with an existing device class Protocol, and
+which include integrated circuits using alternative physical interfaces to
+UniPro. Following are various types of such protocols implemented in the Linux
+kernel Greybus subsystem: USB Protocol, GPIO Protocol, SPI Protocol, SDIO
+Protocol, UART Protocol, PWM Protocol, and I2C Protocol.
+
+The individual protocols aren't described in great detail here to keep this
+article short.
+
+Thanks
+------
+
+Thanks to Jonathan Corbet for his help in reviewing this article.
diff --git a/opp/compare.txt b/opp/compare.txt
new file mode 100644
index 0000000..a9a24c4
--- /dev/null
+++ b/opp/compare.txt
@@ -0,0 +1,273 @@
+DVFS simplified with the OPP library
+====================================
+
+Until Linux kernel release 4.5, the operating performance points (OPP) framework
+was acting as a helper library that provided table of voltage-frequency pairs
+(with some additional information) for the Linux kernel devices. Linux kernel
+frameworks, like cpufreq and devreq, used these OPP tables to perform dynamic
+voltage and frequency scaling (DVFS) for the devices. The OPP framework creates
+these tables dynamically via platform specific code and statically from Device
+Tree (DT) blobs.
+
+The OPP framework gained the infrastructure to do DVFS on behalf of a consumer
+driver, during the 4.6 development cycle. This helped reducing the complexity
+of the device drivers, which can focus on their platform specific details now.
+
+The rest of this article discusses what has changed and how can we use it for
+our device drivers.
+
+Operating performance point (OPP)
+---------------------------------
+
+The SoCs have become very complex and power-efficient now a days. There are
+multiple sub-modules within the SoC that work in conjunction all the time. But
+not all of them are required to function at their highest performance frequency
+and voltage levels, as that makes them less power-efficient. The devices (like
+CPUs, GPUs, IO devices, etc.) have the capability of working on a range of
+frequency and voltage pairs. They stay at lower frequencies when the system
+load is low and at higher frequencies otherwise.
+
+The set of discrete tuples consisting of, but not limited to, frequency and
+voltage pairs that the device supports are called 'Operating Performance Points'
+(AKA OPPs). For example, a CPU core which supports: {1.0 GHz at minimum voltage 1.0 V}, {1.1
+GHz at minimum voltage 1.1 V}, and {1.2 GHz at minimum voltage 1.3 V} can be
+represent by these OPP tuples:
+
+----
+ Hz uV
+{1000000000, 1000000}
+{1100000000, 1100000}
+{1200000000, 1200000}
+----
+
+These tuples may contain more configurables as well, for example voltage levels
+for multiple power-supplies. The example at the end of this article shows how
+the OPP nodes are present in a Device Tree (DT) blob.
+
+
+Before Linux kernel release 4.6, the OPP framework was responsible to create an
+OPP table by parsing the OPP table from DT (or via the platform specific code)
+and provide a set of helpers to inquire about the target OPPs. For example,
+finding floor or ceil OPP corresponding to the target frequency. The consumer drivers of the OPP library, like .../cpufreq/cpufreq-dt.c, use the
+helpers to find an OPP corresponding to the target frequency and use it to
+configure the device's clock and power-supplies (if required).
+
+
+What's new
+----------
+
+For the most common configuration (with at most one power-supply for the
+device), all consumer drivers had pretty much identical DVFS code. And it made
+sense to let the OPP core configure the device to a particular OPP and simplify
+the consumer drivers by removing such code from them. During the 4.6 development cycle, the OPP core gained the functionality to
+perform DVFS on behalf of the consumer driver. The consumer driver needs to pass
+a target frequency, and the OPP finds and set the best possible OPP
+corresponding to that.
+
+In order to perform DVFS on behalf of the consumer driver, the OPP core needs
+some of the device's resources. Some of them are acquired automatically by the
+OPP core, while the core needs help from the driver to get others. And it is
+important for driver writers to understand the expectations of the OPP core
+before they try to use the OPP core to do DVFS for their devices.
+
+
+In order to change the frequency of a device, the OPP core needs the pointer of
+the 'struct clk' for the device. The OPP core gets this automatically by calling
+clk_get() using the device's 'struct device' pointer. The consumer driver must make sure that the device has a
+valid clock registered for it with the clock framework, otherwise the OPP core
+will fail to do DVFS for the device.
+
+Voltage scaling isn't always required while doing frequency scaling and so
+acquiring the power-supply resources is optional. But for the platforms that
+need to do voltage scaling, the OPP core needs some input from the consumer
+driver. The OPP core supports devices that don't need power-supply, or need
+single or multiple power-supplies. The consumer driver needs to provide the
+names of all the power-supplies to the OPP core, that are required to be
+configured to perform DVFS for the device. The consumer driver needs to call the
+below routine only once for a device, and the OPP core will acquire the required
+power-supply resources for the device.
+
+----
+struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[], unsigned int count);
+----
+
+Here, 'dev' is the pointer to the device structure, 'names' is the pointer to an
+array of power-supply names and 'count' is the number of entries in that array.
+This routine returns a pointer to the 'struct opp_table' for the device on
+success and an error number (casted as a pointer) if something went wrong. The
+order in which the names of the power-supplies are present in this array is
+significant. The OPP core assumes that the entries in the 'opp-microvolt'
+property in the OPP table in DT will be present in the same order as in the
+array. Refer to the example at the end for more on 'opp-microvolt' property. If
+the helper 'dev_pm_opp_set_regulators()' isn't called for a device, the OPP core
+assumes that the device doesn't need to participate in voltage scaling and that
+frequency scaling can be done independently.
+
+The OPP core in turn calls the below routine for each string present in the
+'names' array. If the OPP core fails to get the regulator corresponding to any
+of the strings, it returns with an error from 'dev_pm_opp_set_regulators()'.
+
+----
+regulator_get_optional(dev, names[i]);
+----
+
+Here, 'dev' is the pointer to the device structure and 'names[i]' represents an
+entry in the 'names' array.
+
+Once the consumer driver is done with the OPP table, it should free the
+resources acquired by the OPP core using the following routine:
+
+----
+void dev_pm_opp_put_regulators(struct opp_table *opp_table);
+----
+
+Here, 'opp_table' is the pointer to the OPP table, earlier returned by
+'dev_pm_opp_set_regulators()'.
+
+
+*Lets do DVFS now !!*
+
+Once the OPP core has all the resources it needs to do DVFS for a device, the
+consumer drivers can use the helpers described below to let the OPP core perform
+DVFS on its behalf. DVFS methods differ a bit depending on the number of
+power-supplies required to be configured for the device. In the most common cases
+the OPP core either needs to do only frequency scaling (no power-supply) or need
+to do voltage scaling for a single power-supply along with it. For such
+platforms, the consumer driver needs to call the below helper to let the OPP
+core do DVFS for the device.
+
+----
+int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
+----
+
+Here, 'dev' is the pointer to the device structure, and 'target_freq' is the
+frequency we need to program the device for. This routines configures the device
+for the 'ceil' OPP (OPP with lowest frequency greater than or equal to the
+target frequency) corresponding to the target frequency. This routine returns
+zero on success and a negative error number otherwise.
+
+If the device doesn't need to do voltage scaling at all, then
+'dev_pm_opp_set_rate()' can be called without calling
+'dev_pm_opp_set_regulators()' earlier. Otherwise, 'dev_pm_opp_set_regulators()'
+must be called successfully before calling 'dev_pm_opp_set_rate()'. If the
+target OPP has higher frequency than the current OPP, then
+'dev_pm_opp_set_rate()' does voltage scaling before doing frequency scaling.
+Otherwise frequency scaling is done before voltage scaling.
+
+
+The handling is a bit different in the complex cases where voltage scaling of
+multiple power-supplies is required to be done. The order in which multiple
+power-supplies need to be programmed is pretty much platform specific and it is
+very difficult to come up with common code that can work for all of them. To
+simplify things, the OPP core provides the capability to provide platform
+specific 'set_opp()' callbacks, which will be called by the OPP core from within
+'dev_pm_opp_set_rate()' at the time of DVFS.
+
+The platform specific callback can be registered using:
+
+----
+struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data));
+----
+
+Here, 'dev' is the pointer to the device structure, and 'set_opp' is the
+platform specific callback. The callback takes 'struct dev_pm_set_opp_data' as
+argument, which contains all the configuration the callback needs to do DVFS,
+and returns 0 on success and negative error number otherwise. This helper
+returns a pointer to the 'struct opp_table' for the device on success and an
+error number (casted as a pointer) if something went wrong.
+
+The platform specific callback should be unregistered using following routine
+after the consumer driver is done with the OPP table:
+
+----
+void dev_pm_opp_register_put_opp_helper(struct opp_table *opp_table);
+----
+
+Here, 'opp_table' is the pointer to the OPP table, earlier returned by
+'dev_pm_opp_register_set_opp_helper()'.
+
+
+Here is an example that connect the dots to explain how it all fits together. We
+have two CPU devices here (that share their clock/voltage rails) and we need to
+configure a single power-supply to perform DVFS for them.
+
+- Device Tree using 'operating-points-v2' bindings
+
+----
+ / {
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ cpu@0 {
+ compatible = "arm,cortex-a9";
+ reg = <0>;
+ next-level-cache = <&L2>;
+ clocks = <&clk_controller 0>;
+ clock-names = "cpu";
+ vdd-supply = <&vdd_supply0>;
+ operating-points-v2 = <&cpu_opp_table>;
+ };
+
+ cpu@1 {
+ compatible = "arm,cortex-a9";
+ reg = <1>;
+ next-level-cache = <&L2>;
+ clocks = <&clk_controller 0>;
+ clock-names = "cpu";
+ vdd-supply = <&vdd_supply0>;
+ operating-points-v2 = <&cpu_opp_table>;
+ };
+ };
+
+ cpu_opp_table: opp_table {
+ compatible = "operating-points-v2";
+ opp-shared;
+
+ opp@1000000000 {
+ opp-hz = /bits/ 64 <1000000000>;
+ opp-microvolt = <990000 1000000 1050000>;
+ opp-microamp = <70000>;
+ clock-latency-ns = <300000>;
+ opp-suspend;
+ };
+ opp@1100000000 {
+ opp-hz = /bits/ 64 <1100000000>;
+ opp-microvolt = <1090000 1100000 1150000>;
+ opp-microamp = <80000>;
+ clock-latency-ns = <310000>;
+ };
+ opp@1200000000 {
+ opp-hz = /bits/ 64 <1200000000>;
+ opp-microvolt = <1190000 1200000 1250000>;
+ opp-microamp = <90000>;
+ clock-latency-ns = <290000>;
+ turbo-mode;
+ };
+ };
+ };
+----
+
+- Platform specific code
+
+----
+ const char *name[] = {"vdd"};
+ struct opp_table *opp_table;
+
+ opp_table = dev_pm_opp_set_regulators(dev, &name, ARRAY_SIZE(name));
+ if (IS_ERR(opp_table))
+ dev_err(dev, "Failed to set regulators: %d\n", PTR_ERR(opp_table));
+----
+
+- Consumer driver responsible for DVFS
+
+----
+ ret = dev_pm_opp_set_rate(dev, target_freq);
+ if (ret)
+ dev_err(dev, "Failed to set rate: %d\n", ret);
+----
+
+With these enhancements in the OPP core, using the standard interfaces like
+clocks and regulators, the device drivers are simplified to great extent. Going
+forward we should enhance the OPP core further to keep all future DVFS related
+configurations at a single place.
diff --git a/opp/corbet_edited.txt b/opp/corbet_edited.txt
new file mode 100644
index 0000000..45c521d
--- /dev/null
+++ b/opp/corbet_edited.txt
@@ -0,0 +1,260 @@
+Device power management with the OPP library
+
+Until Linux kernel release 4.5, the operating performance points (OPP) framework
+was acting as a helper library that provided table of voltage-frequency pairs
+(with some additional information) for the kernel. Kernel frameworks, like
+cpufreq and devreq, used these OPP tables to perform dynamic voltage and
+frequency scaling (DVFS) for the devices. The OPP framework creates these tables
+dynamically via platform specific code and statically from Device Tree (DT)
+blobs.
+
+The OPP framework gained the infrastructure to do DVFS on behalf of a consumer
+driver during the 4.6 development cycle. This helped reducing the complexity of
+the device drivers, which can focus on their platform specific details now.
+
+The rest of this article discusses what has changed and how can we use it for
+our device drivers.
+
+Operating performance points
+
+Systems on chips (SoCs) have become increasingly complex and power-efficient.
+There are multiple sub-modules within a SoC that work in conjunction, but not
+all of them are required to function at their highest performance frequency and
+voltage levels, as that can be less power-efficient. Devices like CPUs, GPUs,
+and I/O devices have the capability of working at a range of frequency and
+voltage pairs. They should stay at lower frequencies when the system load is low
+and at higher frequencies otherwise.
+
+The set of discrete tuples consisting of frequency and voltage pairs that the
+device supports are called "operating performance points" (AKA OPPs). For
+example, a CPU core which can operate at: 1.0GHz at minimum voltage 1.0V, 1.1GHz
+at minimum voltage 1.1V, and 1.2GHz at minimum voltage 1.3V can be represented
+by these OPP tuples:
+
+----
+ Hz uV
+ 1000000000 1000000
+ 1100000000 1100000
+ 1200000000 1200000
+----
+
+These tuples may contain more configurable values as well, for example voltage
+levels for multiple power supplies. The example at the end of this article shows
+how the OPP nodes are present in a device tree (DT) blob.
+
+Before the 4.6 kernel, the OPP framework was responsible for creating an OPP
+table by parsing the device tree (or via the platform-specific code) and provide
+a set of helpers to inquire about the target OPPs. For example, finding floor or
+ceil OPP corresponding to the target frequency. The consumer drivers of the OPP
+library used the helpers to find an OPP corresponding to the target frequency
+and used it to configure the device's clock and power-supplies (if required).
+
+What's new
+
+For the most common configurations (with at most one power supply for the
+device), all consumer drivers had pretty much identical DVFS code. So it made
+sense to let the OPP core configure the device to a particular OPP and simplify
+the drivers by removing such code from them. During the 4.6 development cycle,
+the OPP core thus gained the functionality to perform DVFS on behalf of device
+drivers. Those drivers need to pass a target frequency, and the OPP core will
+find and set the best possible OPP corresponding to that.
+
+In order to perform DVFS on behalf of device drivers, the OPP core needs some of
+the device's resources. Some of them are acquired automatically by the OPP core,
+while the core needs help from the driver to get others. It is important for
+driver writers to understand the expectations of the OPP core before they try to
+use it core to do DVFS for their devices.
+
+In order to change the frequency of a device, the OPP core needs the pointer of
+the struct clk for the device. The OPP core gets this automatically by calling
+clk_get() using the device's struct device pointer. The driver must make sure
+that the device has a valid clock registered for it with the clock framework,
+otherwise the OPP core will fail to do DVFS for the device.
+
+Voltage scaling isn't always required while doing frequency scaling, so
+acquiring the power-supply resources is optional. But for platforms that need to
+do voltage scaling, the OPP core needs some input from the driver. The OPP core
+supports devices that don't need a power supply, or that need single or multiple
+supplies. The driver needs to provide the names of all the power supplies to the
+OPP core that are required to be configured to perform DVFS for the device,
+using:
+
+----
+ struct opp_table *dev_pm_opp_set_regulators(struct device *dev,
+ const char * const names[],
+ unsigned int count);
+----
+
+Here, dev is the pointer to the device structure, names is the pointer to an
+array of power-supply names and count is the number of entries in that array.
+This routine returns a pointer to the struct opp_table for the device on success
+and an error number (using ERR_PTR()) if something goes wrong. The order in
+which the names of the power supplies are present in this array is significant.
+The OPP core assumes that the entries in the opp-microvolt property in the OPP
+table in DT will be present in the same order as in the array. Refer to the
+example at the end for more on the opp-microvolt property. If this function
+isn't called for a device, the OPP core assumes that the device doesn't need to
+participate in voltage scaling and that frequency scaling can be done
+independently.
+
+The OPP core in turn calls regulator_get_optional() for each string present in
+the names array. If the OPP core fails to get the regulator corresponding to any
+of the strings, it returns with an error.
+
+Once the consumer driver is done with the OPP table, it should free the
+resources acquired by the OPP core using the following routine:
+
+----
+ void dev_pm_opp_put_regulators(struct opp_table *opp_table);
+----
+
+Here, opp_table is the pointer to the OPP table, earlier returned by
+dev_pm_opp_set_regulators().
+
+Performing DVFS
+
+Once the OPP core has all the resources it needs to do DVFS for a device, the
+consumer drivers can use the helpers described below to let the OPP core perform
+DVFS on its behalf. DVFS methods differ a bit depending on the number of power
+supplies required to be configured for the device. In the most common cases, the
+OPP core either needs to do only frequency scaling (no power supply) or need to
+do voltage scaling for a single power supply along with it. For such platforms,
+the driver needs to call this helper to let the OPP core do DVFS for the device:
+
+----
+ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
+----
+
+Where dev is the pointer to the device structure, and target_freq is the
+frequency we need to program the device for. This routine configures the device
+for the OPP with the lowest frequency greater than or equal to the target
+frequency. This routine returns zero on success and a negative error number
+otherwise.
+
+If the device doesn't need to do voltage scaling at all, then
+dev_pm_opp_set_rate() can be called without calling dev_pm_opp_set_regulators()
+earlier. Otherwise, dev_pm_opp_set_regulators() must be called successfully
+before calling dev_pm_opp_set_rate(). If the target OPP has higher frequency
+than the current OPP, then dev_pm_opp_set_rate() does voltage scaling before
+doing frequency scaling. Otherwise frequency scaling is done before voltage
+scaling.
+
+The handling is a bit different in the complex cases where voltage scaling of
+multiple power supplies is required. The order in which multiple power supplies
+need to be programmed is platform-specific and it is difficult to come up with
+common code that can work in all cases. To simplify things, the OPP core
+provides the capability to provide platform-specific set_opp() callbacks, which
+will be called by the OPP core from within dev_pm_opp_set_rate() at the time of
+DVFS. This callback can be registered using:
+
+----
+ struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev,
+ int (*set_opp)(struct dev_pm_set_opp_data *data));
+----
+
+Here, dev is the pointer to the device structure, and set_opp() is the platform
+specific callback. The callback takes struct dev_pm_set_opp_data as argument,
+which contains all the configuration the callback needs to do DVFS, and returns
+zero on success and negative error number otherwise. This helper returns a
+pointer to the struct opp_table for the device on success and an error number
+(casted as a pointer) if something went wrong.
+
+The platform specific callback should be unregistered using following routine
+after the consumer driver is done with the OPP table:
+
+----
+ void dev_pm_opp_register_put_opp_helper(struct opp_table *opp_table);
+----
+
+Here, opp_table is the pointer to the OPP table, earlier returned by
+dev_pm_opp_register_set_opp_helper().
+
+Connecting it all together
+
+Here is an example that connect the dots to explain how it all fits together. We
+have two CPU devices here (that share their clock/voltage rails) and we need to
+configure a single power supply to perform DVFS for them. The device-tree
+fragment describing the CPUs themselves would be:
+
+----
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ cpu@0 {
+ compatible = "arm,cortex-a9";
+ reg = <0>;
+ next-level-cache = <&L2>;
+ clocks = <&clk_controller 0>;
+ clock-names = "cpu";
+ vdd-supply = <&vdd_supply0>;
+ operating-points-v2 = <&cpu_opp_table>;
+ };
+
+ cpu@1 {
+ compatible = "arm,cortex-a9";
+ reg = <1>;
+ next-level-cache = <&L2>;
+ clocks = <&clk_controller 0>;
+ clock-names = "cpu";
+ vdd-supply = <&vdd_supply0>;
+ operating-points-v2 = <&cpu_opp_table>;
+ };
+ };
+----
+
+These definitions reference cpu_opp_table, which is a table describing the valid
+operating points for these CPUs; it is also found in the device tree:
+
+----
+ cpu_opp_table: opp_table {
+ compatible = "operating-points-v2";
+ opp-shared;
+
+ opp@1000000000 {
+ opp-hz = /bits/ 64 <1000000000>;
+ opp-microvolt = <990000 1000000 1050000>;
+ opp-microamp = <70000>;
+ clock-latency-ns = <300000>;
+ opp-suspend;
+ };
+ opp@1100000000 {
+ opp-hz = /bits/ 64 <1100000000>;
+ opp-microvolt = <1090000 1100000 1150000>;
+ opp-microamp = <80000>;
+ clock-latency-ns = <310000>;
+ };
+ opp@1200000000 {
+ opp-hz = /bits/ 64 <1200000000>;
+ opp-microvolt = <1190000 1200000 1250000>;
+ opp-microamp = <90000>;
+ clock-latency-ns = <290000>;
+ turbo-mode;
+ };
+ };
+----
+
+The platform-specific code needed to set up DVFS would look something like:
+
+----
+ const char *name[] = {"vdd"};
+ struct opp_table *opp_table;
+
+ opp_table = dev_pm_opp_set_regulators(dev, &name, ARRAY_SIZE(name));
+ if (IS_ERR(opp_table))
+ dev_err(dev, "Failed to set regulators: %d\n", PTR_ERR(opp_table));
+----
+
+The driver responsible for voltage and frequency scaling would then do something
+like this:
+
+----
+ ret = dev_pm_opp_set_rate(dev, target_freq);
+ if (ret)
+ dev_err(dev, "Failed to set rate: %d\n", ret);
+----
+
+With these enhancements in the OPP core, using the standard interfaces like
+clocks and regulators, the device drivers are simplified to great extent. Going
+forward we should enhance the OPP core further to keep all future DVFS related
+configurations at a single place.
diff --git a/opp/intro.txt b/opp/intro.txt
new file mode 100644
index 0000000..2e0a1b5
--- /dev/null
+++ b/opp/intro.txt
@@ -0,0 +1,16 @@
+During the link:https://lwn.net/Articles/687511[4.6] development cycle, the OPP
+framework gained the infrastructure to do
+link:https://en.wikipedia.org/wiki/Dynamic_voltage_scaling[DVFS] on behalf of
+the device drivers. This helps in reducing the complexity of the device drivers,
+which can focus on their platform specific details now. The rest of this article
+discusses what has changed and how can we use it to simplify our device drivers.
+
+Until Linux kernel release link:https://lwn.net/Articles/679931[4.5], the
+operating performance points (OPP) framework was acting as a helper library that
+provided table of voltage-frequency pairs (with some additional information) for
+the kernel. Kernel frameworks, like cpufreq and devreq, used these OPP tables to
+perform dynamic voltage and frequency scaling (DVFS) for the devices. The OPP
+framework creates these tables dynamically via platform specific code and
+statically from Device Tree
+(link:https://www.kernel.org/doc/Documentation/devicetree/usage-model.txt[DT])
+blobs.
diff --git a/opp/use_opp_to_do_dvfs.html b/opp/use_opp_to_do_dvfs.html
new file mode 100644
index 0000000..955c997
--- /dev/null
+++ b/opp/use_opp_to_do_dvfs.html
@@ -0,0 +1,279 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta name="generator" content="AsciiDoc 8.6.9">
+<title>DVFS simplified with the OPP library</title>
+</head>
+<body>
+<h1>DVFS simplified with the OPP library</h1>
+<p>
+</p>
+<a name="preamble"></a>
+<p>Until Linux kernel release <a href="https://lwn.net/Articles/679931">4.5</a>, the
+operating performance points (OPP) framework was acting as a helper library that
+provided table of voltage-frequency pairs (with some additional information)
+for the Linux kernel devices. Linux kernel frameworks, like cpufreq and devreq,
+used these OPP tables to perform dynamic voltage and frequency scaling
+(<a href="https://en.wikipedia.org/wiki/Dynamic_voltage_scaling">DVFS</a>) for the
+devices. The OPP framework creates these tables dynamically via platform
+specific code and statically from Device Tree
+(<a href="https://www.kernel.org/doc/Documentation/devicetree/usage-model.txt">DT</a>)
+blobs.</p>
+<p>The OPP framework gained the infrastructure to do DVFS on behalf of a consumer
+driver, during the <a href="https://lwn.net/Articles/687511">4.6</a> development cycle.
+This helped reducing the complexity of the device drivers, which can focus on
+their platform specific details now.</p>
+<p>The rest of this article discusses what has changed and how can we use it for
+our device drivers.</p>
+<hr>
+<h2><a name="_operating_performance_point_opp"></a>Operating performance point (OPP)</h2>
+<p>The <a href="https://en.wikipedia.org/wiki/System_on_a_chip">SoCs</a> have become very
+complex and power-efficient now a days. There are multiple sub-modules within
+the SoC that work in conjunction all the time. But not all of them are required
+to function at their highest performance
+<a href="https://en.wikipedia.org/wiki/Frequency">frequency</a> and
+<a href="https://en.wikipedia.org/wiki/Voltage">voltage</a> levels, as that makes them
+less power-efficient. The devices (like
+<a href="https://en.wikipedia.org/wiki/Central_processing_unit">CPUs</a>,
+<a href="https://en.wikipedia.org/wiki/Graphics_processing_unit">GPUs</a>, IO devices,
+etc.) have the capability of working on a range of frequency and voltage pairs.
+They stay at lower frequencies when the system load is low and at higher
+frequencies otherwise.</p>
+<p>The set of discrete tuples consisting of, but not limited to, frequency and
+voltage pairs that the device supports are called <em>Operating Performance Points</em>
+(AKA OPPs).</p>
+<p>For example, a CPU core which supports: {1.0 GHz at minimum voltage 1.0 V}, {1.1
+GHz at minimum voltage 1.1 V}, and {1.2 GHz at minimum voltage 1.3 V} can be
+represent by these OPP tuples:</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code> Hz uV
+{1000000000, 1000000}
+{1100000000, 1100000}
+{1200000000, 1200000}</code></pre>
+</td></tr></table>
+<p>These tuples may contain more configurables as well, for example voltage levels
+for multiple power-supplies. The example at the end of this article shows how
+the OPP nodes are present in a Device Tree (DT) blob.</p>
+<p>Before Linux kernel release 4.6, the OPP framework was responsible to create an
+OPP table by parsing the OPP table from DT (or via the platform specific code)
+and provide a set of helpers to inquire about the target OPPs. For example,
+finding floor or ceil OPP corresponding to the target frequency.</p>
+<p>The consumer drivers of the OPP library, like &#8230;/cpufreq/cpufreq-dt.c, use the
+helpers to find an OPP corresponding to the target frequency and use it to
+configure the device&#8217;s clock and power-supplies (if required).</p>
+<hr>
+<h2><a name="_what_8217_s_new"></a>What&#8217;s new</h2>
+<p>For the most common configuration (with at most one power-supply for the
+device), all consumer drivers had pretty much identical DVFS code. And it made
+sense to let the OPP core configure the device to a particular OPP and simplify
+the consumer drivers by removing such code from them.</p>
+<p>During the 4.6 development cycle, the OPP core gained the functionality to
+perform DVFS on behalf of the consumer driver. The consumer driver needs to pass
+a target frequency, and the OPP finds and set the best possible OPP
+corresponding to that.</p>
+<p>In order to perform DVFS on behalf of the consumer driver, the OPP core needs
+some of the device&#8217;s resources. Some of them are acquired automatically by the
+OPP core, while the core needs help from the driver to get others. And it is
+important for driver writers to understand the expectations of the OPP core
+before they try to use the OPP core to do DVFS for their devices.</p>
+<p>In order to change the frequency of a device, the OPP core needs the pointer of
+the <em>struct clk</em> for the device. The OPP core gets this automatically by calling
+clk_get() using the device&#8217;s <em>struct device</em> pointer.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>clk = clk_get(dev, NULL);</code></pre>
+</td></tr></table>
+<p>Here, <em>clk</em> is the pointer to the clock structure, and <em>dev</em> is the pointer to
+the device structure. The consumer driver must make sure that the device has a
+valid clock registered for it with the clock framework, otherwise the OPP core
+will fail to do DVFS for the device.</p>
+<p>Voltage scaling isn&#8217;t always required while doing frequency scaling and so
+acquiring the power-supply resources is optional. But for the platforms that
+need to do voltage scaling, the OPP core needs some input from the consumer
+driver. The OPP core supports devices that don&#8217;t need power-supply, or need
+single or multiple power-supplies. The consumer driver needs to provide the
+names of all the power-supplies to the OPP core, that are required to be
+configured to perform DVFS for the device. The consumer driver needs to call the
+below routine only once for a device, and the OPP core will acquire the required
+power-supply resources for the device.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[], unsigned int count);</code></pre>
+</td></tr></table>
+<p>Here, <em>dev</em> is the pointer to the device structure, <em>names</em> is the pointer to an
+array of power-supply names and <em>count</em> is the number of entries in that array.
+This routine returns a pointer to the <em>struct opp_table</em> for the device on
+success and an error number (casted as a pointer) if something went wrong. The
+order in which the names of the power-supplies are present in this array is
+significant. The OPP core assumes that the entries in the <em>opp-microvolt</em>
+property in the OPP table in DT will be present in the same order as in the
+array. Refer to the example at the end for more on <em>opp-microvolt</em> property. If
+the helper <em>dev_pm_opp_set_regulators()</em> isn&#8217;t called for a device, the OPP core
+assumes that the device doesn&#8217;t need to participate in voltage scaling and that
+frequency scaling can be done independently.</p>
+<p>The OPP core in turn calls the below routine for each string present in the
+<em>names</em> array. If the OPP core fails to get the regulator corresponding to any
+of the strings, it returns with an error from <em>dev_pm_opp_set_regulators()</em>.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>regulator_get_optional(dev, names[i]);</code></pre>
+</td></tr></table>
+<p>Here, <em>dev</em> is the pointer to the device structure and <em>names[i]</em> represents an
+entry in the <em>names</em> array.</p>
+<p>Once the consumer driver is done with the OPP table, it should free the
+resources acquired by the OPP core using the following routine:</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>void dev_pm_opp_put_regulators(struct opp_table *opp_table);</code></pre>
+</td></tr></table>
+<p>Here, <em>opp_table</em> is the pointer to the OPP table, earlier returned by
+<em>dev_pm_opp_set_regulators()</em>.</p>
+<p><strong>Lets do DVFS now !!</strong></p>
+<p>Once the OPP core has all the resources it needs to do DVFS for a device, the
+consumer drivers can use the helpers described below to let the OPP core perform
+DVFS on its behalf. DVFS methods differ a bit depending on the number of
+power-supplies required to be configured for the device. In the most common cases
+the OPP core either needs to do only frequency scaling (no power-supply) or need
+to do voltage scaling for a single power-supply along with it. For such
+platforms, the consumer driver needs to call the below helper to let the OPP
+core do DVFS for the device.</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);</code></pre>
+</td></tr></table>
+<p>Here, <em>dev</em> is the pointer to the device structure, and <em>target_freq</em> is the
+frequency we need to program the device for. This routines configures the device
+for the <em>ceil</em> OPP (OPP with lowest frequency greater than or equal to the
+target frequency) corresponding to the target frequency. This routine returns
+zero on success and a negative error number otherwise.</p>
+<p>If the device doesn&#8217;t need to do voltage scaling at all, then
+<em>dev_pm_opp_set_rate()</em> can be called without calling
+<em>dev_pm_opp_set_regulators()</em> earlier. Otherwise, <em>dev_pm_opp_set_regulators()</em>
+must be called successfully before calling <em>dev_pm_opp_set_rate()</em>. If the
+target OPP has higher frequency than the current OPP, then
+<em>dev_pm_opp_set_rate()</em> does voltage scaling before doing frequency scaling.
+Otherwise frequency scaling is done before voltage scaling.</p>
+<p>The handling is a bit different in the complex cases where voltage scaling of
+multiple power-supplies is required to be done. The order in which multiple
+power-supplies need to be programmed is pretty much platform specific and it is
+very difficult to come up with common code that can work for all of them. To
+simplify things, the OPP core provides the capability to provide platform
+specific <em>set_opp()</em> callbacks, which will be called by the OPP core from within
+<em>dev_pm_opp_set_rate()</em> at the time of DVFS.</p>
+<p>The platform specific callback can be registered using:</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data));</code></pre>
+</td></tr></table>
+<p>Here, <em>dev</em> is the pointer to the device structure, and <em>set_opp</em> is the
+platform specific callback. The callback takes <em>struct dev_pm_set_opp_data</em> as
+argument, which contains all the configuration the callback needs to do DVFS,
+and returns 0 on success and negative error number otherwise. This helper
+returns a pointer to the <em>struct opp_table</em> for the device on success and an
+error number (casted as a pointer) if something went wrong.</p>
+<p>The platform specific callback should be unregistered using following routine
+after the consumer driver is done with the OPP table:</p>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code>void dev_pm_opp_register_put_opp_helper(struct opp_table *opp_table);</code></pre>
+</td></tr></table>
+<p>Here, <em>opp_table</em> is the pointer to the OPP table, earlier returned by
+<em>dev_pm_opp_register_set_opp_helper()</em>.</p>
+<p>Here is an example that connect the dots to explain how it all fits together. We
+have two CPU devices here (that share their clock/voltage rails) and we need to
+configure a single power-supply to perform DVFS for them.</p>
+<ul>
+<li>
+<p>
+Device Tree using <em>operating-points-v2</em> bindings
+</p>
+</li>
+</ul>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code> / {
+ cpus {
+ #address-cells = <b>&lt;1&gt;</b>;
+ #size-cells = <b>&lt;0&gt;</b>;
+
+ cpu@0 {
+ compatible = "arm,cortex-a9";
+ reg = <b>&lt;0&gt;</b>;
+ next-level-cache = &lt;&amp;L2&gt;;
+ clocks = &lt;&amp;clk_controller 0&gt;;
+ clock-names = "cpu";
+ vdd-supply = &lt;&amp;vdd_supply0&gt;;
+ operating-points-v2 = &lt;&amp;cpu_opp_table&gt;;
+ };
+
+ cpu@1 {
+ compatible = "arm,cortex-a9";
+ reg = <b>&lt;1&gt;</b>;
+ next-level-cache = &lt;&amp;L2&gt;;
+ clocks = &lt;&amp;clk_controller 0&gt;;
+ clock-names = "cpu";
+ vdd-supply = &lt;&amp;vdd_supply0&gt;;
+ operating-points-v2 = &lt;&amp;cpu_opp_table&gt;;
+ };
+ };
+
+ cpu_opp_table: opp_table {
+ compatible = "operating-points-v2";
+ opp-shared;
+
+ opp@1000000000 {
+ opp-hz = /bits/ 64 <b>&lt;1000000000&gt;</b>;
+ opp-microvolt = &lt;990000 1000000 1050000&gt;;
+ opp-microamp = <b>&lt;70000&gt;</b>;
+ clock-latency-ns = <b>&lt;300000&gt;</b>;
+ opp-suspend;
+ };
+ opp@1100000000 {
+ opp-hz = /bits/ 64 <b>&lt;1100000000&gt;</b>;
+ opp-microvolt = &lt;1090000 1100000 1150000&gt;;
+ opp-microamp = <b>&lt;80000&gt;</b>;
+ clock-latency-ns = <b>&lt;310000&gt;</b>;
+ };
+ opp@1200000000 {
+ opp-hz = /bits/ 64 <b>&lt;1200000000&gt;</b>;
+ opp-microvolt = &lt;1190000 1200000 1250000&gt;;
+ opp-microamp = <b>&lt;90000&gt;</b>;
+ clock-latency-ns = <b>&lt;290000&gt;</b>;
+ turbo-mode;
+ };
+ };
+ };</code></pre>
+</td></tr></table>
+<ul>
+<li>
+<p>
+Platform specific code
+</p>
+</li>
+</ul>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code> const char *name[] = {"vdd"};
+ struct opp_table *opp_table;
+
+ opp_table = dev_pm_opp_set_regulators(dev, &amp;name, ARRAY_SIZE(name));
+ if (IS_ERR(opp_table))
+ dev_err(dev, "Failed to set regulators: %d\n", PTR_ERR(opp_table));</code></pre>
+</td></tr></table>
+<ul>
+<li>
+<p>
+Consumer driver responsible for DVFS
+</p>
+</li>
+</ul>
+<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td>
+<pre><code> ret = dev_pm_opp_set_rate(dev, target_freq);
+ if (ret)
+ dev_err(dev, "Failed to set rate: %d\n", ret);</code></pre>
+</td></tr></table>
+<p>With these enhancements in the OPP core, using the standard interfaces like
+clocks and regulators, the device drivers are simplified to great extent. Going
+forward we should enhance the OPP core further to keep all future DVFS related
+configurations at a single place.</p>
+<p><a href="mailto:viresh.kumar@linaro.org">Viresh Kumar</a></p>
+<p></p>
+<p></p>
+<hr><p><small>
+Last updated
+ 2017-04-10 13:39:11 IST
+</small></p>
+</body>
+</html>
diff --git a/opp/use_opp_to_do_dvfs.txt b/opp/use_opp_to_do_dvfs.txt
new file mode 100644
index 0000000..f9ff1b7
--- /dev/null
+++ b/opp/use_opp_to_do_dvfs.txt
@@ -0,0 +1,296 @@
+DVFS simplified with the OPP library
+====================================
+
+Until Linux kernel release link:https://lwn.net/Articles/679931[4.5], the
+operating performance points (OPP) framework was acting as a helper library that
+provided table of voltage-frequency pairs (with some additional information)
+for the Linux kernel devices. Linux kernel frameworks, like cpufreq and devreq,
+used these OPP tables to perform dynamic voltage and frequency scaling
+(link:https://en.wikipedia.org/wiki/Dynamic_voltage_scaling[DVFS]) for the
+devices. The OPP framework creates these tables dynamically via platform
+specific code and statically from Device Tree
+(link:https://www.kernel.org/doc/Documentation/devicetree/usage-model.txt[DT])
+blobs.
+
+The OPP framework gained the infrastructure to do DVFS on behalf of a consumer
+driver, during the link:https://lwn.net/Articles/687511[4.6] development cycle.
+This helped reducing the complexity of the device drivers, which can focus on
+their platform specific details now.
+
+The rest of this article discusses what has changed and how can we use it for
+our device drivers.
+
+Operating performance point (OPP)
+---------------------------------
+
+The link:https://en.wikipedia.org/wiki/System_on_a_chip[SoCs] have become very
+complex and power-efficient now a days. There are multiple sub-modules within
+the SoC that work in conjunction all the time. But not all of them are required
+to function at their highest performance
+link:https://en.wikipedia.org/wiki/Frequency[frequency] and
+link:https://en.wikipedia.org/wiki/Voltage[voltage] levels, as that makes them
+less power-efficient. The devices (like
+link:https://en.wikipedia.org/wiki/Central_processing_unit[CPUs],
+link:https://en.wikipedia.org/wiki/Graphics_processing_unit[GPUs], IO devices,
+etc.) have the capability of working on a range of frequency and voltage pairs.
+They stay at lower frequencies when the system load is low and at higher
+frequencies otherwise.
+
+The set of discrete tuples consisting of, but not limited to, frequency and
+voltage pairs that the device supports are called 'Operating Performance Points'
+(AKA OPPs).
+
+For example, a CPU core which supports: {1.0 GHz at minimum voltage 1.0 V}, {1.1
+GHz at minimum voltage 1.1 V}, and {1.2 GHz at minimum voltage 1.3 V} can be
+represent by these OPP tuples:
+
+----
+ Hz uV
+{1000000000, 1000000}
+{1100000000, 1100000}
+{1200000000, 1200000}
+----
+
+These tuples may contain more configurables as well, for example voltage levels
+for multiple power-supplies. The example at the end of this article shows how
+the OPP nodes are present in a Device Tree (DT) blob.
+
+
+Before Linux kernel release 4.6, the OPP framework was responsible to create an
+OPP table by parsing the OPP table from DT (or via the platform specific code)
+and provide a set of helpers to inquire about the target OPPs. For example,
+finding floor or ceil OPP corresponding to the target frequency.
+
+The consumer drivers of the OPP library, like .../cpufreq/cpufreq-dt.c, use the
+helpers to find an OPP corresponding to the target frequency and use it to
+configure the device's clock and power-supplies (if required).
+
+
+What's new
+----------
+
+For the most common configuration (with at most one power-supply for the
+device), all consumer drivers had pretty much identical DVFS code. And it made
+sense to let the OPP core configure the device to a particular OPP and simplify
+the consumer drivers by removing such code from them.
+
+During the 4.6 development cycle, the OPP core gained the functionality to
+perform DVFS on behalf of the consumer driver. The consumer driver needs to pass
+a target frequency, and the OPP finds and set the best possible OPP
+corresponding to that.
+
+In order to perform DVFS on behalf of the consumer driver, the OPP core needs
+some of the device's resources. Some of them are acquired automatically by the
+OPP core, while the core needs help from the driver to get others. And it is
+important for driver writers to understand the expectations of the OPP core
+before they try to use the OPP core to do DVFS for their devices.
+
+
+In order to change the frequency of a device, the OPP core needs the pointer of
+the 'struct clk' for the device. The OPP core gets this automatically by calling
+clk_get() using the device's 'struct device' pointer.
+
+----
+clk = clk_get(dev, NULL);
+----
+
+Here, 'clk' is the pointer to the clock structure, and 'dev' is the pointer to
+the device structure. The consumer driver must make sure that the device has a
+valid clock registered for it with the clock framework, otherwise the OPP core
+will fail to do DVFS for the device.
+
+
+Voltage scaling isn't always required while doing frequency scaling and so
+acquiring the power-supply resources is optional. But for the platforms that
+need to do voltage scaling, the OPP core needs some input from the consumer
+driver. The OPP core supports devices that don't need power-supply, or need
+single or multiple power-supplies. The consumer driver needs to provide the
+names of all the power-supplies to the OPP core, that are required to be
+configured to perform DVFS for the device. The consumer driver needs to call the
+below routine only once for a device, and the OPP core will acquire the required
+power-supply resources for the device.
+
+----
+struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[], unsigned int count);
+----
+
+Here, 'dev' is the pointer to the device structure, 'names' is the pointer to an
+array of power-supply names and 'count' is the number of entries in that array.
+This routine returns a pointer to the 'struct opp_table' for the device on
+success and an error number (casted as a pointer) if something went wrong. The
+order in which the names of the power-supplies are present in this array is
+significant. The OPP core assumes that the entries in the 'opp-microvolt'
+property in the OPP table in DT will be present in the same order as in the
+array. Refer to the example at the end for more on 'opp-microvolt' property. If
+the helper 'dev_pm_opp_set_regulators()' isn't called for a device, the OPP core
+assumes that the device doesn't need to participate in voltage scaling and that
+frequency scaling can be done independently.
+
+The OPP core in turn calls the below routine for each string present in the
+'names' array. If the OPP core fails to get the regulator corresponding to any
+of the strings, it returns with an error from 'dev_pm_opp_set_regulators()'.
+
+----
+regulator_get_optional(dev, names[i]);
+----
+
+Here, 'dev' is the pointer to the device structure and 'names[i]' represents an
+entry in the 'names' array.
+
+Once the consumer driver is done with the OPP table, it should free the
+resources acquired by the OPP core using the following routine:
+
+----
+void dev_pm_opp_put_regulators(struct opp_table *opp_table);
+----
+
+Here, 'opp_table' is the pointer to the OPP table, earlier returned by
+'dev_pm_opp_set_regulators()'.
+
+
+*Lets do DVFS now !!*
+
+Once the OPP core has all the resources it needs to do DVFS for a device, the
+consumer drivers can use the helpers described below to let the OPP core perform
+DVFS on its behalf. DVFS methods differ a bit depending on the number of
+power-supplies required to be configured for the device. In the most common cases
+the OPP core either needs to do only frequency scaling (no power-supply) or need
+to do voltage scaling for a single power-supply along with it. For such
+platforms, the consumer driver needs to call the below helper to let the OPP
+core do DVFS for the device.
+
+----
+int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
+----
+
+Here, 'dev' is the pointer to the device structure, and 'target_freq' is the
+frequency we need to program the device for. This routines configures the device
+for the 'ceil' OPP (OPP with lowest frequency greater than or equal to the
+target frequency) corresponding to the target frequency. This routine returns
+zero on success and a negative error number otherwise.
+
+If the device doesn't need to do voltage scaling at all, then
+'dev_pm_opp_set_rate()' can be called without calling
+'dev_pm_opp_set_regulators()' earlier. Otherwise, 'dev_pm_opp_set_regulators()'
+must be called successfully before calling 'dev_pm_opp_set_rate()'. If the
+target OPP has higher frequency than the current OPP, then
+'dev_pm_opp_set_rate()' does voltage scaling before doing frequency scaling.
+Otherwise frequency scaling is done before voltage scaling.
+
+
+The handling is a bit different in the complex cases where voltage scaling of
+multiple power-supplies is required to be done. The order in which multiple
+power-supplies need to be programmed is pretty much platform specific and it is
+very difficult to come up with common code that can work for all of them. To
+simplify things, the OPP core provides the capability to provide platform
+specific 'set_opp()' callbacks, which will be called by the OPP core from within
+'dev_pm_opp_set_rate()' at the time of DVFS.
+
+The platform specific callback can be registered using:
+
+----
+struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data));
+----
+
+Here, 'dev' is the pointer to the device structure, and 'set_opp' is the
+platform specific callback. The callback takes 'struct dev_pm_set_opp_data' as
+argument, which contains all the configuration the callback needs to do DVFS,
+and returns 0 on success and negative error number otherwise. This helper
+returns a pointer to the 'struct opp_table' for the device on success and an
+error number (casted as a pointer) if something went wrong.
+
+The platform specific callback should be unregistered using following routine
+after the consumer driver is done with the OPP table:
+
+----
+void dev_pm_opp_register_put_opp_helper(struct opp_table *opp_table);
+----
+
+Here, 'opp_table' is the pointer to the OPP table, earlier returned by
+'dev_pm_opp_register_set_opp_helper()'.
+
+
+Here is an example that connect the dots to explain how it all fits together. We
+have two CPU devices here (that share their clock/voltage rails) and we need to
+configure a single power-supply to perform DVFS for them.
+
+- Device Tree using 'operating-points-v2' bindings
+
+----
+ / {
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ cpu@0 {
+ compatible = "arm,cortex-a9";
+ reg = <0>;
+ next-level-cache = <&L2>;
+ clocks = <&clk_controller 0>;
+ clock-names = "cpu";
+ vdd-supply = <&vdd_supply0>;
+ operating-points-v2 = <&cpu_opp_table>;
+ };
+
+ cpu@1 {
+ compatible = "arm,cortex-a9";
+ reg = <1>;
+ next-level-cache = <&L2>;
+ clocks = <&clk_controller 0>;
+ clock-names = "cpu";
+ vdd-supply = <&vdd_supply0>;
+ operating-points-v2 = <&cpu_opp_table>;
+ };
+ };
+
+ cpu_opp_table: opp_table {
+ compatible = "operating-points-v2";
+ opp-shared;
+
+ opp@1000000000 {
+ opp-hz = /bits/ 64 <1000000000>;
+ opp-microvolt = <990000 1000000 1050000>;
+ opp-microamp = <70000>;
+ clock-latency-ns = <300000>;
+ opp-suspend;
+ };
+ opp@1100000000 {
+ opp-hz = /bits/ 64 <1100000000>;
+ opp-microvolt = <1090000 1100000 1150000>;
+ opp-microamp = <80000>;
+ clock-latency-ns = <310000>;
+ };
+ opp@1200000000 {
+ opp-hz = /bits/ 64 <1200000000>;
+ opp-microvolt = <1190000 1200000 1250000>;
+ opp-microamp = <90000>;
+ clock-latency-ns = <290000>;
+ turbo-mode;
+ };
+ };
+ };
+----
+
+- Platform specific code
+
+----
+ const char *name[] = {"vdd"};
+ struct opp_table *opp_table;
+
+ opp_table = dev_pm_opp_set_regulators(dev, &name, ARRAY_SIZE(name));
+ if (IS_ERR(opp_table))
+ dev_err(dev, "Failed to set regulators: %d\n", PTR_ERR(opp_table));
+----
+
+- Consumer driver responsible for DVFS
+
+----
+ ret = dev_pm_opp_set_rate(dev, target_freq);
+ if (ret)
+ dev_err(dev, "Failed to set rate: %d\n", ret);
+----
+
+With these enhancements in the OPP core, using the standard interfaces like
+clocks and regulators, the device drivers are simplified to great extent. Going
+forward we should enhance the OPP core further to keep all future DVFS related
+configurations at a single place.
diff --git a/workqueue/delayed-wq.png b/workqueue/delayed-wq.png
new file mode 100644
index 0000000..0d5ceb2
--- /dev/null
+++ b/workqueue/delayed-wq.png
Binary files differ
diff --git a/workqueue/power_efficient_workqueue.html b/workqueue/power_efficient_workqueue.html
new file mode 100644
index 0000000..3aa05b5
--- /dev/null
+++ b/workqueue/power_efficient_workqueue.html
@@ -0,0 +1,191 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta name="generator" content="AsciiDoc 8.6.9">
+<title>Power efficient workqueue</title>
+</head>
+<body>
+<h1>Power efficient workqueue</h1>
+<p>
+</p>
+<a name="preamble"></a>
+<p>Power-efficient workqueues were first introduced in the
+Linux kernel release <a href="https://lwn.net/Articles/565482/">3.11</a> and few (50+)
+subsystems and drivers were updated to use them later on. These workqueues can
+be quite useful in case of handheld devices (like tablets and smartphones),
+where we want to be power efficient all the time.</p>
+<p>ARM platforms with power-efficient workqueues enabled on Ubuntu and Android have
+shown significant improvements in energy consumption (up to 15% for some use
+cases on <a href="https://developer.arm.com/technologies/big-little">ARM big LITTLE</a>
+platforms). They are worth trying if you care for power.</p>
+<hr>
+<h2><a name="_workqueue"></a>Workqueue</h2>
+<p>Workqueues (wq) are the most common bottom-half mechanism used in the Linux
+kernel for cases where asynchronous execution context is required. The
+asynchronous execution context is provided by the <strong>worker</strong> kernel threads. These
+workers are woken up once a work item is queued on them. The workqueue is
+represented by the <code>struct workqueue_struct</code> and the work item is represented by
+the <code>struct work_struct</code>. The work item points to a function, which is called by
+the worker (in process context) to execute the work. Once the worker has
+finished processing all the work items queued on the workqueue, it becomes idle.</p>
+<p>The most common APIs used to queue a work are:</p>
+<pre><code> bool queue_work(struct workqueue_struct *wq, struct work_struct *work);
+ bool queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work);
+ bool queue_delayed_work(struct workqueue_struct *wq, struct delayed_work *dwork, unsigned long delay);
+ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, struct delayed_work *work, unsigned long delay);</code></pre>
+<p>The first two APIs queue the work immediately, while the other two queue it up
+after <code>delay</code> jiffies have passed since the time the APIs are called. The work
+queued by <code>queue_work_on()</code> (and <code>queue_delayed_work_on()</code>) is executed by the
+worker thread running on a specific CPU only. Whereas, the work queued by
+<code>queue_work()</code> (and <code>queue_delayed_work()</code>) can be run by any CPU in the system
+(though it doesn&#8217;t really happen that way, described later).</p>
+<hr>
+<h2><a name="_the_workqueue_pinning_problem"></a>The workqueue pinning problem</h2>
+<p>A fairly common use-case of workqueues in kernel is to repetitively queue the
+work from the work function, as we need to do some task periodically.</p>
+<p>Below shows example code for this use-case:</p>
+<pre><code> static void foo_handler(struct work_struct *work)
+ {
+ struct delayed_work *dwork = to_delayed_work(work);
+
+ /* Do some work here */
+
+ /* Queue the work again */
+ queue_delayed_work(system_wq, dwork, 10);
+ }
+
+ void foo_init(void)
+ {
+ struct delayed_work *dwork = kmalloc(sizeof(*dwork), GFP_KERNEL);
+
+ if (!dwork)
+ return;
+
+ INIT_DEFERRABLE_WORK(dwork, foo_handler);
+
+ /* Queue work after 10 jiffies */
+ queue_delayed_work(system_wq, dwork, 10);
+ }</code></pre>
+<p><code>foo_init()</code> allocated the delayed work and queued it for 10 jiffies delay and
+the work handler (<code>foo_handler()</code>) performed the periodic work and queued the
+work again. One may think that the work will be executed on any CPU (whichever
+the kernel finds to be most appropriate). But that&#8217;s not really true.</p>
+<p>The workqueue core will most likely queue it on the local CPU, unless the local
+CPU isn&#8217;t part of the global <code>wq_unbound_cpumask</code> (i.e. Mask of CPUs which are
+allowed to execute work that isn&#8217;t queued to a particular CPU). For an octa-core
+platform, the work from the above example will get executed only on CPU-X
+every time. Even if that CPU was idle and some of the other 7 CPUs were not.</p>
+<p>The below figure shows this problem:</p>
+<div align="center">
+<img src="./delayed-wq.png" style="border-width: 0;" alt="./delayed-wq.png" width="700" height="400">
+<p><b>Figure 1. </b>The workqueue pinning problem</p>
+</div>
+<p>The delayed work was first queued by code running on CPU1. The CPU became idle
+and went to a low power state. After 10 jiffies, the timer <code>Tw</code> (internally used
+by delayed work infrastructure) fired on CPU1 and brought it out of the low
+power state. The <code>dwork.work</code> is queued from the timer handler and is serviced
+very soon by a worker thread running on CPU1. The work handler (<code>foo_handler()</code>)
+queued the delayed work again, which internally re-armed the timer <code>Tw</code> to fire
+after 10 jiffies. The CPU had nothing else to do and went into low power state
+again. And again the timer <code>Tw</code> brought it out of the low power state after
+10 jiffies.</p>
+<p>It is probably fine (from power-efficiency point of view) if the CPU was doing
+some work while being interrupted by the timer <code>Tw</code>. But if the CPU is brought
+out of idle only to service the timer and queue the work, it can be very bad
+from power-efficiency point of view. The CPU can be in a deep idle state (deeper
+the state, more is the penalty), or the CPU can be part of a cluster where all
+CPUs are in low power state (and hence the cluster).</p>
+<p>This pinning isn&#8217;t probably good performance wise as well in certain cases, as
+the selected CPU may not be the best available CPU to run <code>foo_handler()</code>. Over
+that, the scheduler wouldn&#8217;t be able to load balance this work with other CPUs
+and the response time of the <code>foo_handler()</code> may increase if the target CPU is
+currently busy.</p>
+<p>There are some benefits of this pinning behavior though. If the <code>foo_handler()</code>
+operates on significant amount of data, it may be better to keep it executing
+on the same CPU every time as the caches would already be hot and we wouldn&#8217;t
+have unnecessary cache misses. Though there are fairly small number of such
+cases where the handlers operate on such data.</p>
+<hr>
+<h2><a name="_power_efficient_workqueue"></a>Power-efficient workqueue</h2>
+<p>The power-efficient workqueue infrastructure is disabled by default. And it can
+be enabled in two ways:</p>
+<ul>
+<li>
+<p>
+Enable <code>CONFIG_WQ_POWER_EFFICIENT</code> in configuration file.
+</p>
+</li>
+<li>
+<p>
+Pass <code>workqueue.power_efficient=true</code> in boot arguments.
+</p>
+</li>
+<li>
+<p>
+<code>workqueue.power_efficient=false</code> can also be used to disable them if enabled
+ in the configuration file.
+</p>
+</li>
+</ul>
+<p>Once the power-efficient workqueue functionality is enabled, a workqueue can be
+made power-efficient by passing <code>WQ_POWER_EFFICIENT</code> flag to the
+<code>alloc_workqueue()</code> helper while allocating the workqueue. Internally the
+workqueue core marks the workqueue as <code>WQ_UNBOUND</code>, if power-efficient workqueue
+feature is enabled.</p>
+<p>Instead of queuing work on the local CPU, the workqueue core asks the scheduler
+to provide the target CPU for the work queued on unbound workqueues (i.e. Ones
+that have <code>WQ_UNBOUND</code> set in their flags). And the work doesn&#8217;t get pinned to a
+CPU anymore.</p>
+<p>But does the scheduler picks the best CPU from power-efficiency point of view ?
+Not always.</p>
+<p>The scheduler algorithm responsible for picking the next CPU for a task is quite
+complex, and we can&#8217;t possibly add every detail in this article. But more likely
+than not, the scheduler picks the idlest (least busy) CPU from the CPUs at the
+last cache level. For a multi cluster platform, it will most likely pick a CPU
+from the same cluster. But if the work handler doesn&#8217;t finish up quickly, load
+balancing will happen and that will move the task to other (possibly idle) CPUs.</p>
+<p>That isn&#8217;t very power-efficient then, isn&#8217;t it ? Yeah, with the current design
+of Linux kernel scheduler, we may not get the best results with power-efficient
+workqueues. There is ongoing work (strongly pushed by the ARM community) to
+make the scheduler more power-aware and power-efficient and power-efficient
+workqueues will be very useful then.</p>
+<p>Currently, they are quite useful (from power-efficiency point of view) with the
+Android kernel as that carries some scheduler modifications to make it energy
+aware.</p>
+<p>As said earlier, power-efficient workqueues also make load balancing possible
+for these work items with the upstream kernel and that can be good in some cases
+for sure (where we wouldn&#8217;t have unnecessary cache misses).</p>
+<p>There are two system level workqueues that set necessary flag to make them
+power-efficient: <code>system_power_efficient_wq</code> and
+<code>system_freezable_power_efficient_wq</code>. And these can be used if you don&#8217;t need a
+private workqueue for your use-case.</p>
+<hr>
+<h2><a name="_power_numbers"></a>Power numbers</h2>
+<p><a href="https://lwn.net/Articles/548281/">Testing</a> was done on 32 bit ARM big LITTLE
+(heterogeneous) platform (4 LITTLE Cortex A7 cores and 4 big Cortex A15 cores.
+Audio was played in background using aplay while rest of the system was fairly
+idle. Linaro&#8217;s ubuntu-devel distribution was used and the kernel also had some
+out of tree scheduler changes (Task placement patches discussed over LKML then).</p>
+<p>The results across multiple test iterations showed average improvement of 15.7 %
+in energy consumption with the power-efficient workqueues enabled. The unit of
+the numbers shown here is <code>Joules</code>.</p>
+<pre><code> Vanila Kernel Vanila Kernel
+ + scheduler patches + scheduler patches + power-efficient wq
+
+ A15 cluster 0.322866 0.2289042
+ A7 cluster 2.619137 2.2514632
+
+ Total 2.942003 2.4803674</code></pre>
+<p>With the mainline kernel, the power-efficient workqueues will give better
+results today as well (as the scheduler picks the best target CPU) and it is
+going to further improve as and when the scheduler gets more energy aware.</p>
+<p>You should try enabling it for your platform if you care for power.</p>
+<p></p>
+<p></p>
+<hr><p><small>
+Last updated
+ 2017-08-11 17:09:52 IST
+</small></p>
+</body>
+</html>
diff --git a/workqueue/power_efficient_workqueue.txt b/workqueue/power_efficient_workqueue.txt
new file mode 100644
index 0000000..b03b348
--- /dev/null
+++ b/workqueue/power_efficient_workqueue.txt
@@ -0,0 +1,204 @@
+Power efficient workqueue
+=========================
+
+Power-efficient workqueues were first introduced in the
+Linux kernel release link:https://lwn.net/Articles/565482/[3.11] and few (50+)
+subsystems and drivers were updated to use them later on. These workqueues can
+be quite useful in case of handheld devices (like tablets and smartphones),
+where we want to be power efficient all the time.
+
+ARM platforms with power-efficient workqueues enabled on Ubuntu and Android have
+shown significant improvements in energy consumption (up to 15% for some use
+cases on link:https://developer.arm.com/technologies/big-little[ARM big LITTLE]
+platforms). They are worth trying if you care for power.
+
+
+Workqueue
+---------
+
+Workqueues (wq) are the most common bottom-half mechanism used in the Linux
+kernel for cases where asynchronous execution context is required. The
+asynchronous execution context is provided by the *worker* kernel threads. These
+workers are woken up once a work item is queued on them. The workqueue is
+represented by the `struct workqueue_struct` and the work item is represented by
+the `struct work_struct`. The work item points to a function, which is called by
+the worker (in process context) to execute the work. Once the worker has
+finished processing all the work items queued on the workqueue, it becomes idle.
+
+The most common APIs used to queue a work are:
+
+....
+ bool queue_work(struct workqueue_struct *wq, struct work_struct *work);
+ bool queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work);
+ bool queue_delayed_work(struct workqueue_struct *wq, struct delayed_work *dwork, unsigned long delay);
+ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, struct delayed_work *work, unsigned long delay);
+....
+
+The first two APIs queue the work immediately, while the other two queue it up
+after `delay` jiffies have passed since the time the APIs are called. The work
+queued by `queue_work_on()` (and `queue_delayed_work_on()`) is executed by the
+worker thread running on a specific CPU only. Whereas, the work queued by
+`queue_work()` (and `queue_delayed_work()`) can be run by any CPU in the system
+(though it doesn't really happen that way, described later).
+
+
+The workqueue pinning problem
+-----------------------------
+
+A fairly common use-case of workqueues in kernel is to repetitively queue the
+work from the work function, as we need to do some task periodically.
+
+Below shows example code for this use-case:
+
+....
+ static void foo_handler(struct work_struct *work)
+ {
+ struct delayed_work *dwork = to_delayed_work(work);
+
+ /* Do some work here */
+
+ /* Queue the work again */
+ queue_delayed_work(system_wq, dwork, 10);
+ }
+
+ void foo_init(void)
+ {
+ struct delayed_work *dwork = kmalloc(sizeof(*dwork), GFP_KERNEL);
+
+ if (!dwork)
+ return;
+
+ INIT_DEFERRABLE_WORK(dwork, foo_handler);
+
+ /* Queue work after 10 jiffies */
+ queue_delayed_work(system_wq, dwork, 10);
+ }
+....
+
+`foo_init()` allocated the delayed work and queued it for 10 jiffies delay and
+the work handler (`foo_handler()`) performed the periodic work and queued the
+work again. One may think that the work will be executed on any CPU (whichever
+the kernel finds to be most appropriate). But that's not really true.
+
+The workqueue core will most likely queue it on the local CPU, unless the local
+CPU isn't part of the global `wq_unbound_cpumask` (i.e. Mask of CPUs which are
+allowed to execute work that isn't queued to a particular CPU). For an octa-core
+platform, the work from the above example will get executed only on CPU-X
+every time. Even if that CPU was idle and some of the other 7 CPUs were not.
+
+The below figure shows this problem:
+
+image::./delayed-wq.png[title="The workqueue pinning problem",height=400,width=700,align="center"]
+
+The delayed work was first queued by code running on CPU1. The CPU became idle
+and went to a low power state. After 10 jiffies, the timer `Tw` (internally used
+by delayed work infrastructure) fired on CPU1 and brought it out of the low
+power state. The `dwork.work` is queued from the timer handler and is serviced
+very soon by a worker thread running on CPU1. The work handler (`foo_handler()`)
+queued the delayed work again, which internally re-armed the timer `Tw` to fire
+after 10 jiffies. The CPU had nothing else to do and went into low power state
+again. And again the timer `Tw` brought it out of the low power state after
+10 jiffies.
+
+It is probably fine (from power-efficiency point of view) if the CPU was doing
+some work while being interrupted by the timer `Tw`. But if the CPU is brought
+out of idle only to service the timer and queue the work, it can be very bad
+from power-efficiency point of view. The CPU can be in a deep idle state (deeper
+the state, more is the penalty), or the CPU can be part of a cluster where all
+CPUs are in low power state (and hence the cluster).
+
+This pinning isn't probably good performance wise as well in certain cases, as
+the selected CPU may not be the best available CPU to run `foo_handler()`. Over
+that, the scheduler wouldn't be able to load balance this work with other CPUs
+and the response time of the `foo_handler()` may increase if the target CPU is
+currently busy.
+
+There are some benefits of this pinning behavior though. If the `foo_handler()`
+operates on significant amount of data, it may be better to keep it executing
+on the same CPU every time as the caches would already be hot and we wouldn't
+have unnecessary cache misses. Though there are fairly small number of such
+cases where the handlers operate on such data.
+
+
+Power-efficient workqueue
+-------------------------
+
+The power-efficient workqueue infrastructure is disabled by default. And it can
+be enabled in two ways:
+
+* Enable `CONFIG_WQ_POWER_EFFICIENT` in configuration file.
+* Pass `workqueue.power_efficient=true` in boot arguments.
+* `workqueue.power_efficient=false` can also be used to disable them if enabled
+ in the configuration file.
+
+Once the power-efficient workqueue functionality is enabled, a workqueue can be
+made power-efficient by passing `WQ_POWER_EFFICIENT` flag to the
+`alloc_workqueue()` helper while allocating the workqueue. Internally the
+workqueue core marks the workqueue as `WQ_UNBOUND`, if power-efficient workqueue
+feature is enabled.
+
+Instead of queuing work on the local CPU, the workqueue core asks the scheduler
+to provide the target CPU for the work queued on unbound workqueues (i.e. Ones
+that have `WQ_UNBOUND` set in their flags). And the work doesn't get pinned to a
+CPU anymore.
+
+But does the scheduler picks the best CPU from power-efficiency point of view ?
+Not always.
+
+The scheduler algorithm responsible for picking the next CPU for a task is quite
+complex, and we can't possibly add every detail in this article. But more likely
+than not, the scheduler picks the idlest (least busy) CPU from the CPUs at the
+last cache level. For a multi cluster platform, it will most likely pick a CPU
+from the same cluster. But if the work handler doesn't finish up quickly, load
+balancing will happen and that will move the task to other (possibly idle) CPUs.
+
+That isn't very power-efficient then, isn't it ? Yeah, with the current design
+of Linux kernel scheduler, we may not get the best results with power-efficient
+workqueues. There is ongoing work (strongly pushed by the ARM community) to
+make the scheduler more power-aware and power-efficient and power-efficient
+workqueues will be very useful then.
+
+Currently, they are quite useful (from power-efficiency point of view) with the
+Android kernel as that carries some scheduler modifications to make it energy
+aware.
+
+As said earlier, power-efficient workqueues also make load balancing possible
+for these work items with the upstream kernel and that can be good in some cases
+for sure (where we wouldn't have unnecessary cache misses).
+
+There are two system level workqueues that set necessary flag to make them
+power-efficient: `system_power_efficient_wq` and
+`system_freezable_power_efficient_wq`. And these can be used if you don't need a
+private workqueue for your use-case.
+
+
+Power numbers
+-------------
+
+link:https://lwn.net/Articles/548281/[Testing] was done on 32 bit ARM big LITTLE
+(heterogeneous) platform (4 LITTLE Cortex A7 cores and 4 big Cortex A15 cores.
+Audio was played in background using aplay while rest of the system was fairly
+idle. Linaro's ubuntu-devel distribution was used and the kernel also had some
+out of tree scheduler changes (Task placement patches discussed over LKML then).
+
+The results across multiple test iterations showed average improvement of 15.7 %
+in energy consumption with the power-efficient workqueues enabled. The unit of
+the numbers shown here is `Joules`.
+
+....
+
+ Vanila Kernel Vanila Kernel
+ + scheduler patches + scheduler patches + power-efficient wq
+
+ A15 cluster 0.322866 0.2289042
+ A7 cluster 2.619137 2.2514632
+
+ Total 2.942003 2.4803674
+....
+
+
+With the mainline kernel, the power-efficient workqueues will give better
+results today as well (as the scheduler picks the best target CPU) and it is
+going to further improve as and when the scheduler gets more energy aware.
+
+You should try enabling it for your platform if you care for power.