aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-11-18sched/cpufreq_sched: properly handle config as modulev3.18/topic/EASJuri Lelli
When cpufreq_sched is built as a module we have to handle it properly. Add related fixes. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-11-13kernel/sched: guard task_fits_capacity call site with CONFIG_SMPJuri Lelli
As task_fits_capacity is not defined on !CONFIG_SMP, we have to guard its calling sites. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-11-13trace/events/sched: guard PELT related tracepoint with CONFIG_SMPJuri Lelli
We have to guard PELT related tracepoints with CONFIG_SMP as some cfs_rq fields are not defined for !CONFIG_SMP. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-11-10arm, arm64: guard arm_arch_scale_freq_capacity with CONFIG_CPU_FREQJuri Lelli
When !CONFIG_CPU_FREQ arm_arch_scale_freq_capacity can simply return SCHED_CAPACITY_SCALE. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-11-10sched/fair: move capacity_curr_of outside CONFIG_SMPJuri Lelli
CONFIG_CPU_FREQ_GOV_SCHED configurations need to use capacity_curr_of; move it outside CONFIG_SMP regions. Once we do that arch_scale_freq_capacity as to be changed as well, because struct sched_domain is not defined on !CONFIG_SMP. Luckily, sd parameter is not used anywhere in that function, so we can simply clean it up. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-11-10kernel/sched: move cpu_capacity_orig outside CONFIG_SMPJuri Lelli
cpu_capacity_orig might be used in !CONFIG_SMP configuration as well (e.g., when GOV_SCHED is compiled). Move it outside CONFIG_SMP boundaries. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-11-10sched/Makefile: make energy.c unit dependent from CONFIG_SMPJuri Lelli
On !CONFIG_SMP system sched_group_energy is not declared (no EM), so make code that initialises such structures from DT dependent from CONFIG_SMP. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-11-10sched/sched.h: fix build errors by guarding with CONFIG_CPU_FREQJuri Lelli
Build fails for !CONFIG_SMP configurations. Fix it by guarding code with proper ifdefs. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-11-10sched/fair: fix build errors by guarding with CONFIG_SMPJuri Lelli
Build fails for !CONFIG_SMP configurations. Fix it by guarding code with proper ifdefs. Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-10-05DEBUG: sched/debug: Add energy procfs interfaceDietmar Eggemann
This patch makes the energy data available via procfs. The related files are placed as sub-directory named 'energy' inside the /proc/sys/kernel/sched_domain/cpuX/domainY/groupZ directory for those cpu/domain/group tuples which have energy information. The following example depicts the contents of /proc/sys/kernel/sched_domain/cpu0/domain0/group[01] for a system which has energy information attached to domain level 0. ├── cpu0 │ ├── domain0 │ │ ├── busy_factor │ │ ├── busy_idx │ │ ├── cache_nice_tries │ │ ├── flags │ │ ├── forkexec_idx │ │ ├── group0 │ │ │ └── energy │ │ │ ├── cap_states │ │ │ ├── idle_states │ │ │ ├── nr_idle_states_below │ │ │ ├── nr_cap_states │ │ │ └── nr_idle_states │ │ ├── group1 │ │ │ └── energy │ │ │ ├── cap_states │ │ │ ├── idle_states │ │ │ ├── nr_idle_states_below │ │ │ ├── nr_cap_states │ │ │ └── nr_idle_states │ │ ├── idle_idx │ │ ├── imbalance_pct │ │ ├── max_interval │ │ ├── max_newidle_lb_cost │ │ ├── min_interval │ │ ├── name │ │ ├── newidle_idx │ │ └── wake_idx │ └── domain1 │ ├── busy_factor │ ├── busy_idx │ ├── cache_nice_tries │ ├── flags │ ├── forkexec_idx │ ├── idle_idx │ ├── imbalance_pct │ ├── max_interval │ ├── max_newidle_lb_cost │ ├── min_interval │ ├── name │ ├── newidle_idx │ └── wake_idx The files 'nr_idle_states', 'nr_cap_states', and 'nr_idle_states_below' contain a scalar value whereas 'idle_states' and 'cap_states' contain a vector of power consumption at this idle state respectively (compute capacity, power consumption) at this capacity state. Change-Id: Ie0a039369c25403785afbde955dd75ddd1cfe3d5 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2015-10-05DEBUG: schedtune: add tracepoint on boostgroup updatesPatrick Bellasi
Change-Id: I993971dde9af5e8914fc9dbc7c675a9f7bae0bba Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05DEBUG: schedtune: add tracepoint on P-E space filteringPatrick Bellasi
Change-Id: I2947dd97573a41a8a7cac8a6f6467f3341f8009f Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05DEBUG: schedtune: add tracepoint for schedtune_tasks_update() valuesPatrick Bellasi
Change-Id: Ieb6e2c7cf0ee09bfda0fae8f1d7ad747a0013540 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05DEBUG: schedtune: add tracepoint for energy_diff() valuesPatrick Bellasi
Change-Id: I3aad4bfdbab392752637b5e04cfcb842d4b8206e Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05DEBUG: schedtune: add tracepoint for CPU boost signalPatrick Bellasi
Change-Id: I02747984be22093d6c4e7a04f43cdd8adea13bd4 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05DEBUG: schedtune: add tracepoint for task boost signalPatrick Bellasi
Change-Id: I026267388453754629e6323cb0e7bb14b4fe4598 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05DEBUG: schedtune: add tracepoint for SchedTune configuration updatePatrick Bellasi
Change-Id: I4fc761f304090a91662fd91143dda358b5b91f1d Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05DEBUG: arm64: add cpu_capacity change tracepointJuri Lelli
arm64 bits of the same debugging stuff. Change-Id: If59276ff59e4376d1f689247f5c94a632fd94715 Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-10-05DEBUG: arm: add cpu_capacity change tracepointJuri Lelli
This is useful when we want to compare cpu utilization and cpu curr capacity side by side. Change-Id: I5ac9660a84de44ff6de58db87565e3b704cfd7ac Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-10-05DEBUG: add cpu to load_avg_task tracepointJuri Lelli
This is useful when plotting where a task was running during an experiment. Change-Id: I5a90aaf3eb1254b4d1c6ec3d7e4159d9643b0b71 Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-10-05DEBUG: load_avg_{cpu,task} and contrib_scale_f tracepointsJuri Lelli
Change-Id: I923425a231cbbc5dfac86fc3e2a90459761bd2ea Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-10-05FIXUP: sched/cpufreq_sched: Clear __sched_energy_freq only when no policy is ↵Ricky Liang
using sched governor The current implementation of SchedDVFS is turned off completely if a cpufreq policy is completely offline. This is because the way the {set|clear}_sched_energy_freq() functions are implemented only allows the __sched_energy_freq static_key to be either 1 or 0, while there are two possible cpufreq policies (one for each cluster) on Oak. The __sched_energy_freq static_key acts like a switch for the SchedDVFS hook in the CFS scheduler. So whenever a cluster is turned off, which sets the corresponding cpufreq policy offline, will result in the __sched_energy_freq being set to 0. Signed-off-by: Ricky Liang <jcliang@chromium.org> BUG=chrome-os-partner:45243 TEST=Build kernel and run on Oak rev3. Disable the big cluster with: echo 0 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu3/online and run: watch -n 0.5 "cat /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state" to make sure SchedDVFS is still adjusting the LITTLE cores' frequency. Change-Id: Id51ef2138f3f38c4e1b585b6c1fd4bc395ec9ea5
2015-10-05FIXUP: sched/cpufreq_sched: Disable SchedDVFS until cpufreq is fully ↵Ricky Liang
initialized. Signed-off-by: Ricky Liang <jcliang@chromium.org> BUG=chrome-os-partner:44780 TEST=Boot kernel on Oak rev3 and doesn't see warnings. Change-Id: I34b7a155091ec1adf6ca9792193d11920340c76a
2015-10-05FIXUP: Fix incorrect default config param for CPU_FREQ_GOV_SCHEDKapileshwar Singh
This problem fixes build failure when CONFIG_CPU_FREQ_DEFAULT_GOV_SCHED is set. Change-Id: I9df0078c308794f82acdf767cfcdddafdc1a640e Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
2015-10-05WIP: sched/{fair,tune}: add per task boost signalPatrick Bellasi
When per task boosting is enabled, all the CPUs and Task specific signals must be boosted according to the specific boost value defined by the boost group assigned to the task. This patch updates all the CFS scheduler consumers, for the task "utilization" signal and the CPU "usage" signal, to use the boost value defined by the boost group assigned to a task. Change-Id: I4348b387d66a7a8f458deca426c928d0341cf7a6 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/{fair,tune}: track RUNNABLE tasks impact on per CPU boost valuePatrick Bellasi
When per-task boosting is enabled, every time a task enter/exit a CPU its boost value could impact the currently selected OPP for that CPU. Thus, the "aggregated" boot value for that CPU potentially needs to be updated to match the current maximum boost value among all the tasks currently RUNNABLE on that CPU. This patch introduces the required support to keep track of which boost groups are impacting a CPU. Each time a task is enqueue/dequeued to/from a CPU its boost group is used to increment a per-cpu counter of RUNNABLE tasks on that CPU. Only when the number of runnable tasks for a specific boost group becomes 1 or 0 the corresponding boost group is changing its effects on that CPU, specifically: a) boost_group::tasks == 1: this boost group start to impact the CPU b) boost_group::tasks == 0: this boost group stop to impact the CPU In each of these two conditions the aggregation function: sched_cpu_update(cpu) could be required to run in order to idenfity the new maximum boost value required for the CPU. The proposed patch allows to reduce at minimum the number of times the aggregation function is executed while still providing the required support to always boost a CPU to the maximum boost value required by all its currently RUNNABLE tasks. Change-Id: I5c85eda78fa061fd9dea486b331d93b3adee8159 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/tune: compute and keep track of per CPU boost valuePatrick Bellasi
When per task boosting is enabled, we could have multiple RUNNABLE tasks which are concurrently scheduled on the same CPU but each one with a different boost value. For example, we could have a scenarios like that: Task SchedTune CGroup Boost Value T1 root 0 T2 low-priority 10 T3 interactive 90 In these conditions we expect a CPU to be configured according to a proper "aggregation" of the required boost values for all the tasks currently scheduled by that CPU. A suitable aggregation function is the one which tracks the MAX boost value for all the tasks RUNNABLE on a CPU. This approach allows to always satisfy the most boost demanding task while at the same time: a) boosting all the concurrently scheduled tasks thus reducing potential co-scheduling side-effects on demanding tasks b) reduce the frequency switching required to SchedDVFS thus being more friendly to architectures with slow frequency switching times Every time a task enter/exit the RQ of a CPU the max boost value should be potentially updated considering all the boost groups currently "affecting" that CPU, i.e. because they have at least one RUNNABLE task currently allocated on that CPU. This patch introduces the required support to keep track of the boost groups currently affecting each CPU. The provided implementation is quite simple. Indeed, thanks to the limited number of boost groups which could be allocated on a system, a small and memory efficient per-cpu array of boost groups values (cpu_boost_groups) is used which is updated, for each CPU entry, by schedtune_boostgroup_update() but just when a schedtune CGroup boost value is updated. However, this is expected to be a not frequent operation, perhaps done just one time at system boot time. Change-Id: I4065438aa3b24731508dbb4ce4ce0278b306dacc Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/{fair,tune}: add initial support for CGroups based boostingPatrick Bellasi
To support tasks performance boosting, while still operating in energy-aware mode, the previous patches introduces a system-wide "knob" which allows to tune how much the system is going to be optimized for energy efficiency vs performances. The usage of a single knob has the advantage to be a simple solution, both from the implementation and the usage standpoint. However, on a real system it is in general difficult to identify a single value for the knob which could fits multiple different tasks. For example, some kernel threads and/or user-space background services should be better managed always in an energy efficient way while still being able to boost the performance of specific workloads. In order to improve the flexibility of the task boosting mechanism this patch is the first of a small series which extends the previous implementation to introduce a "per task group" support. The new "schedtune" CGroups controller is added by this patch which allows to configure different boost value for different groups of tasks. To keep the implementation simple while still being effective for a boosting strategy, the new controller: 1. allows only a two layer hierarchy 2. supports only a small number of boost groups A two layer hierarchy allows to place each task either: a) in the root control groups thus being subject to a system-wide boosting value b) in a child of the root group thus being subject to the specific boost value defined by that "boost group" This decision is based on the observation that in general it is difficult to define a clear user-space semantic for nested groups. The limited number of "boost groups" supported is mainly motivated by the observation that in a real system it could be useful to have only few classes of tasks which deserve different treatment. For example, background vs foreground or interactive vs low-priority. As an additional benefit, a limited number of boost groups allows also to have a more simple implementation especially for the code required to compute the boost value for CPUs which have runnable tasks belonging to different boost groups. This first patch introduces just the basic CGroups support as well as an updated version of the schedtune_accept_deltas() function which filter the energy-diff value considering the specific boost value assigned to the task being evaluated. Since CPUs boosting deserves a new set of updates, with respect to the system-wide approach, it is going to be introduced by a following patch. Change-Id: Id712e313a1b9d038035e218fe0ce272f68eb60ee Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/fair: filter energy_diff() based on energy_payoff valuePatrick Bellasi
Once the SchedTune support is enabled and the CPU bandwidth demand of a task is boosted, we could expect increased energy consumptions which are balanced by corresponding increases of tasks performances. However, the current implementation of the energy_diff() function accepts all and _only_ the schedule candidates which results into a reduced expected system energy, which works against the boosting strategy. This patch links the energy_diff() function with the "energy payoff" engine provided by SchedTune. The energy variation computed by the energy_diff() function is now filtered using the SchedTune support to evaluated the energy payoff for a boosted task. With that patch, the energy_diff() function is going to reported as "acceptable schedule candidate" (i.e. technically, with a negative energy_diff) all and only the schedule candidate which corresponds to a positive energy_payoff. Change-Id: Iafacbbc5bfb3e7888d30c3796cde6f2ee8633b89 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/tune: add performance-energy space filtering functionPatrick Bellasi
Boosting the utilisation of a task could imply the selection of a CPU/OPP and cluster which has an higher energy impact. In that conditions the energy_diff() function will discard the scheduling proposal under evaluation thus impacting the possibility to boost the performances of a task. A trade-off between increased energy consumption and performances benefits should be evaluated. This patch introduces the required support to evaluate the "energy payoff" corresponding to a certain variation on expected task performances. A proposed schedule candidate is described in term of expected variation of energy and performance. These values are used to verify in which region of the Performance-Energy Space the solution becomes and thus returns an energy_payoff metrics which is positive if - the increased energy is compensated by a configured performance gain i.e. either in Optimal (O) or Performance Boost (B) region - the decreased energy is compliant with a configured performance constraint i.e. in Performance Constraint (C) region Otherwise the returned energy_payoff will be negative, i.e. not in the above cases or in the Suboptimal (S) region. The classification of a schedule decision in one of the four regions is defined also by a couple of thresholds in the Performance-Energy variation Space, which allows to identify two main optimization subregions: - Performance Boost (B) region top right quadrant: both nrg and cap expected variations are positive - Performance Constraining (C) region bottom-left quadrant: both nrg and cap expected variations are negative This implementation binds both the thresholds values to the single value of the boost margin, thus allowing to use a single boost knob to configure all the thresholds. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Conflicts: kernel/sched/tune.c Change-Id: I1e77b49135c041c18e80de2090069e0aab0ec4ad
2015-10-05WIP: sched/tune: add support to compute normalized energyPatrick Bellasi
The current EAS implementation not only considers only energy variations, while completely disregard the impact on performance for the selection of a certain schedule candidate, but it also compute "absolute" energy variations. In order to properly define a trade-off strategy between increased energy consumption and performances benefits it is required to compare energy variations with performance variations. Thus, both these metrics must be expressed in comparable units. While the performance variations are expressed in terms of capacity deltas, which are expressed in the range [0..SCHED_LOAD_SCALE], the same scale is not used for energy variations thus This patch introduces a new the support function: schedtune_normalize_energy(energy_diff) which returns a normalized value in the same range of capacity variations, i.e. [0..SCHED_LOAD_SCALE]. NOTE: energy normalization requires some data from the Energy Model (EM) of the specific target. Since these values are expected to be provided at boot time along with the EM itself, for the time being this patch hard-code the values of the same ARM TC2 EM provided by the previous EAS patches. Change-Id: I0cf3f91b6277911a2f1f796d03661fc2c04ab2df Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/fair: keep track of energy/capacity variationsPatrick Bellasi
The current EAS implementation does not allow "to boost" tasks performances, for example by running them at an higher OPP (or a more capable CPU), even if that could require a "reasonable" increase in energy consumption. To defined how much reasonable is an energy increase with respect to a required boost value, it is required to define and compute a trade-off between the expected energy and performance variations. However, the current EAS implementation considers only energy variations while completely disregard the impact on performance for the selection of a certain schedule candidate. This patch extends the eenv energy environment to keep track of both energy and performance deltas which are implied by the activation of a schedule candidate. The performance variation is estimated considering the different capacities of the CPUs in which the task could be scheduled. The idea is that while running on a CPU with higher capacity (e.g. higher operating point) the task could (potentially) complete faster and thus get better performance. Change-Id: I7c25eaaa17f6e7c9e13659aadc52bbc7ba125cc6 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/fair: add boosted CPU usagePatrick Bellasi
The CPU usage signal is used by EAS as an estimation of the overall currently bandwidth allocated on a CPU. When SchedDVFS is in use, this signal affects the selection of the operating points (OPP) required to accommodate all the workload allocated in a CPU. A convenient way to boost the performances of the tasks running on a CPU, which is also little intrusive, is to boost the CPU usage signal each time it is used to select an OPP. This patch introduces a new function: get_boosted_cpu_usage(cpu) to returns a boosted value for the usage of a specified CPU. The margin added to the original usage is: 1. computed by the boosting strategy introduced by a previous patch 2. proportional to the system-wide boost value defined by the sysctl interface, this one also introduced by a previous patch The boosted signal is used by SchedDVFS transparently each time it requires to get an estimation of the capacity required for a CPU. Change-Id: I4a3612c7ceddb8b68a1896d05ff3407cb5bf8141 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/fair: add boosted task utilizationPatrick Bellasi
The task utilization signal, which is derived from PELT signals and properly scaled to be architecture and frequency invariat, is used by EAS as an estimation of the task requirements in terms of CPU bandwidth. This signal affects both the CPU selection as well as, when SchedDVFS (the scheduler controlled CPUFreq governor) in use, the selection of the current operating points (OPP) for the CPU. A convenient way to bias these decisions, which is also little intrusive, is to boost the task utilization signal each time must be used to support them. This patch introduces the new function: boosted_task_utilization(task) which returns a boosted value for the utilization of the specified task. The margin added to the original utilization is: 1. computed by the boosting strategy introduced by a previous patch 2. proportional to the system-wide boost value defined by the sysctl interface, this also introduced by a previous patch The boosted signal is used by EAS a. transparently, via its integration into the task_fits() function b. explicitly, in the energy-aware wakeup path Change-Id: I032041cfd4406bb050bb2190c0332935ece5592a Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/fair: add function to convert boost value into "margin"Patrick Bellasi
The basic idea of the boost knob is to "artificially inflate" some signals to make a Task/RQ appears more demanding than what actually it is. Independently from the specific signal, a consistent and possibly simple semantic for the concept of "signal boosting" should be defined. Such a semantic must defined: 1. how we translate the boost percentage into a "margin" value to be added to the original signal to be inflated 2. what is the meaning of a boost value from a user-space perspective This patch provides the implementation of a possible boost semantic, named Signal Proportional Compensation (SPC), where the boost percentage (BP) is used to compute a margin (M) which is proportional to the complement of the original signal (OS): M = BP * (1024 - OS) The computed margin then added to the OS to obtain the Boosted Signal (BS) BS = OS + M The proposed boost semantic has these main features: - each signals gets a boost which is proportional to its delta with respect to the maximum available capacity in the system - a 100% boosting has a clear understanding from a user-space perspective, since it means simply to run (possibly) "all" tasks at the max OPP - each boosting value means to improve the tasks performance by a quantity which is proportional to the maximum achievable performance on that system. Thus this semantics is somehow forcing a behaviour which is: 50% boosting means to run at half-way between the current and the maximum performance which a task could achieve on that system This patch provides the code to implement a fast integer division to convert a boost percentage (BP) value into a margin (M). NOTE: this code is suitable for all signals operating in range [0..SCHED_LOAD_SCALE] Change-Id: I0e5bf863e7844300ef3744cff02d97502669949a Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/tune: add sysctl interface to define a boost valuePatrick Bellasi
The energy-aware scheduler extension has been designed to exploit an energy model to support an energy efficient allocation of tasks on available CPUs. The main goal of the current implementation is to schedule tasks in such a way to minimise the expected system energy while still meeting the requirements of tasks in terms of computational demand. Thus, the current implementation does not allow "to boost" tasks performances, for example by running them at an higher OPP (or a more capable CPU), even if that could require a "reasonable" increase in energy consumption. To support tasks performance boosting, while still operating in energy-aware mode, the scheduler should to provide a "knob" which allows to tune how much the system is going to be optimised for energy efficiency vs performances. This patch is the first of a series which provides a simple sysctl based interface to define an EAS tuning knob. For the time being, just one system-wide "boost" tunable is exposed via: /proc/sys/kernel/sched_cfs_boost which can be configured in the range [0..100], to define a percentage where: - 0% boost requires to operate in "standard" EAS mode by scheduling tasks at the minimum capacities required by the workload demand - 100% boost requires to push at maximum the task performances, "regardless" of the incurred energy consumption A boost value in between these two boundaries is used to bias the power/performance trade-off, the higher the boost value the more the EAS scheduler is biased toward performance boosting instead of energy efficiency. Change-Id: I1fb22390aee04e8f1a55a9f30db505d9040ec693 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: sched/tune: add detailed documentationPatrick Bellasi
The SchedTune EAS module introduces the support which allows EAS to be tune at run-time to optimize more for energy efficiency or task performance boosting. This patch provides a detailed description of the motivations and design decisions behind the implementation of the SchedTune EAS module. Change-Id: I37ea9c33eb54f9eae594f87772ff77b9d2606ab3 Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2015-10-05WIP: arm64, topology: Updates to use DT bindings for EAS costing dataRobin Randhawa
With the bindings and the associated accessors to extract data from the bindings in place, remove the static hard-coded data from topology.c and use the accesors instead. Change-Id: Id2e68b26a5a7b33ec0b3dba8779bf1a2451c4abe Signed-off-by: Robin Randhawa <robin.randhawa@arm.com>
2015-10-05WIP: sched: Support for extracting EAS energy costs from DTRobin Randhawa
This patch implements support for extracting energy cost data from DT. The data should conform to the DT bindings for energy cost data needed by EAS (energy aware scheduling). Change-Id: Ia435bd4d4b111bb6257ffb2f5385b5f4b70d5aa6 Signed-off-by: Robin Randhawa <robin.randhawa@arm.com>
2015-10-05WIP: Documentation: DT bindings for energy model cost data required by EASRobin Randhawa
EAS (energy aware scheduling) provides the scheduler with an alternative objective - energy efficiency - as opposed to it's current performance oriented objectives. EAS relies on a simple platform energy cost model to guide scheduling decisions. The model only considers the CPU subsystem. This patch adds documentation describing DT bindings that should be used to supply the scheduler with an energy cost model. Change-Id: I312c8d2f46d3aed0b8f39bd6e4f1739699bc5944 Signed-off-by: Robin Randhawa <robin.randhawa@arm.com>
2015-10-05WIP: arm64: Cpu invariant scheduler load-tracking supportJuri Lelli
arm64 counterpart of arm bits, with some variations. Use the max cap states for each type of CPU to setup cpu_scale. Change-Id: Ib33b5fa379d520ff84985bca8ecd2257ef0fcab9 Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-10-05WIP: arm64: Frequency invariant scheduler load-tracking supportJuri Lelli
arm64 counterpart of arm bits. Implements arch-specific function to provide the scheduler with a frequency scaling correction factor for more accurate load-tracking. The factor is: current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu) This implementation only provides frequency invariance. No micro-architecture invariance yet. Change-Id: I3d814abffdf0e67c5c2cc0df3e2446584f5468ae Signed-off-by: Juri Lelli <juri.lelli@arm.com>
2015-10-05WIP: sched: Consider misfit tasks when load-balancingMorten Rasmussen
With the new group_misfit_task load-balancing scenario additional policy conditions are needed when load-balancing. Misfit task balancing only makes sense between source group with lower capacity than the target group. If capacities are the same, fallback to normal group_other balancing. The aim is to balance tasks such that no task has its throughput hindered by compute capacity if a cpu with more capacity is available. Load-balancing is generally based on average load in the sched_groups, but for misfitting tasks it is necessary to introduce exceptions to migrate tasks against usual metrics and optimize throughput. This patch ensures the following load-balance for mixed capacity systems (e.g. ARM big.LITTLE) for always-running tasks: 1. Place a task on each cpu starting in order from cpus with highest capacity to lowest until all cpus are in use (i.e. one task on each cpu). 2. Once all cpus are in use balance according to compute capacity such that load per capacity is approximately the same regardless of the compute capacity (i.e. big cpus get more tasks than little cpus). Necessary changes are introduced in find_busiest_group(), calculate_imbalance(), and find_busiest_queue(). This includes passing the group_type on to find_busiest_queue() through struct lb_env, which is currently only considers imbalance and not the imbalance situation (group_type). To avoid taking remote rq locks to examine source sched_groups for misfit tasks, each cpu is responsible for tracking misfit tasks themselves and update the rq->misfit_task flag. This means checking task utilization when tasks are scheduled and on sched_tick. Change-Id: I458461cebf269d6d4eeac6f83e4c84f4e4d7a9dd Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2015-10-05WIP: sched: Add group_misfit_task load-balance typeMorten Rasmussen
To maximize throughput in systems with reduced capacity cpus (e.g. high RT/IRQ load and/or ARM big.LITTLE) load-balancing has to consider task and cpu utilization as well as per-cpu compute capacity when load-balancing in addition to the current average load based load-balancing policy. Tasks that are scheduled on a reduced capacity cpu need to be identified and migrated to a higher capacity cpu if possible. To implement this additional policy an additional group_type (load-balance scenario) is added: group_misfit_task. This represents scenarios where a sched_group has tasks that are not suitable for its per-cpu capacity. group_misfit_task is only considered if the system is not overloaded in any other way (group_imbalanced or group_overloaded). Identifying misfit tasks requires the rq lock to be held. To avoid taking remote rq locks to examine source sched_groups for misfit tasks, each cpu is responsible for tracking misfit tasks themselves and update the rq->misfit_task flag. This means checking task utilization when tasks are scheduled and on sched_tick. Change-Id: I092a348ed0ff37eae123f0d8d6dcf1435d51bfb1 Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2015-10-05WIP: sched: Add per-cpu max capacity to sched_group_capacityMorten Rasmussen
struct sched_group_capacity currently represents the compute capacity sum of all cpus in the sched_group. Unless it is divided by the group_weight to get the average capacity per cpu it hides differences in cpu capacity for mixed capacity systems (e.g. high RT/IRQ utilization or ARM big.LITTLE). But even the average may not be sufficient if the group covers cpus of different capacities. Instead, by extending struct sched_group_capacity to indicate max per-cpu capacity in the group a suitable group for a given task utilization can easily be found such that cpus with reduced capacity can be avoided for tasks with high utilization (not implemented by this patch). Change-Id: I3ad0e6df855b1a184db05cb310e91e1e03061467 Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2015-10-05WIP: sched: Do eas idle balance regardless of the rq avg idle valueDietmar Eggemann
EAS relies on idle balance to migrate a misfit task towards a cpu with higher capacity. When such a cpu becomes idle, idle balance should happen even if the rq avg idle is smaller than the sched migration cost (default 500us). The rq avg idle is updated during the wakeup of a task in case the rq has a non-null idle_stamp. This value stays unchanged and valid until the next task wakes up on this cpu after an idle period. So rq avg idle could be smaller than sched migration cost preventing the idle balance from happening. In this case we would be at the mercy of wakeup, periodic or nohz-idle load balancing to put another task on this cpu. To break this dependency towards rq avg idle make EAS idle balance independent from this rq avg idle has to be larger than sched migration cost. Change-Id: I880a25180062444d72947461d976dc44f9672f13 Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2015-10-05FROMLIST: sched/fair: cpufreq_sched triggers for load balancingJuri Lelli
As we don't trigger freq changes from {en,de}queue_task_fair() during load balancing, we need to do explicitly so on load balancing paths. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@arm.com> (am from https://patchwork.kernel.org/patch/6737901) Signed-off-by: Juri Lelli <juri.lelli@arm.com> Change-Id: I43466dfc1b4d93998ada9038a9f9ed14892c2a84
2015-10-05FROMLIST: sched/cpufreq_sched: modify pcpu_capacity handlingJuri Lelli
Use the cpu argument of cpufreq_sched_set_cap() to handle per_cpu writes, as the thing can be called remotely (e.g., from load balacing code). cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@arm.com> (am from https://patchwork.kernel.org/patch/6737941) Signed-off-by: Juri Lelli <juri.lelli@arm.com> Change-Id: Ie2fba4c77a7d60e2b69909a5844672cad2ad1cbe
2015-10-05FROMLIST: sched/fair: jump to max OPP when crossing UP thresholdJuri Lelli
Since the true utilization of a long running task is not detectable while it is running and might be bigger than the current cpu capacity, create the maximum cpu capacity head room by requesting the maximum cpu capacity once the cpu usage plus the capacity margin exceeds the current capacity. This is also done to try to harm the performance of a task the least. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@arm.com> (am from https://patchwork.kernel.org/patch/6738071) Signed-off-by: Juri Lelli <juri.lelli@arm.com> Change-Id: Ic7cded62657d5932adb202a6946d5cccd99ea4bb
2015-10-05FROMLIST: sched/{fair,cpufreq_sched}: add reset_capacity interfaceJuri Lelli
When a CPU is going idle it is pointless to ask for an OPP update as we would wake up another task only to ask for the same capacity we are already running at (utilization gets moved to blocked_utilization). We thus add cpufreq_sched_reset_capacity() interface to just reset our current capacity request without triggering any real update. At wakeup we will use the decayed utilization to select an appropriate OPP. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@arm.com> (am from https://patchwork.kernel.org/patch/6738101) Signed-off-by: Juri Lelli <juri.lelli@arm.com> Change-Id: Ia1240a9363889498e604ffce5892c554811711fb