aboutsummaryrefslogtreecommitdiff
path: root/kernel/time
AgeCommit message (Collapse)Author
2013-04-19nohz: Force boot CPU outside full dynticks rangeFrederic Weisbecker
The timekeeping job must be able to run early on boot because there may be some pre-SMP (and thus pre-initcalls ) components that rely on it. The IO-APIC is one such users as it tests the timer health by watching jiffies progression. Given that it happens before we know the initial online set, we can't rely on it to select a timekeeper. We need one before SMP time otherwise we simply crash on boot. To fix this and keep things simple for now, force the boot CPU outside of the full dynticks range in any case and do this early on kernel parameter parsing time. We might want a trickier solution later, expecially for aSMP architectures that need to assign housekeeping tasks to arbitrary low power CPUs. But it's still first pass KISS time for now. Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-18nohz: New APIs to re-evaluate the tick on full dynticks CPUsFrederic Weisbecker
Provide two new helpers in order to notify the full dynticks CPUs about some internal system changes against which they may reconsider the state of their tick. Some practical examples include: posix cpu timers, perf tick and sched clock tick. For now the notifying handler, implemented through IPIs, is a stub that will be implemented when we get the tick stop/restart infrastructure in. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-17clockevents: Switch into oneshot mode even if broadcast registered lateStephen Boyd
tick_oneshot_notify() is used to notify a particular CPU to try to switch into oneshot mode after a oneshot capable tick device is registered and tick_clock_notify() is used to notify all CPUs to try to switch into oneshot mode after a high res clocksource is registered. There is one caveat; if the tick devices suffer from FEAT_C3_STOP we don't try to switch into oneshot mode unless we have a oneshot capable broadcast device already registered. If the broadcast device is registered after the tick devices that have FEAT_C3_STOP we'll never try to switch into oneshot mode again, causing us to be stuck in periodic mode forever. Avoid this scenario by calling tick_clock_notify() after we register the broadcast device so that we try to switch into oneshot mode on all CPUs one more time. [ tglx: Adopted to timers/core and added a comment ] Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1366219566-29783-1-git-send-email-sboyd@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-17timer_list: Convert timer list to be a proper seq_fileNathan Zimmer
When running with 4096 cores attemping to read /proc/timer_list will fail with an ENOMEM condition. On a sufficantly large systems the total amount of data is more then 4mb, so it won't fit into a single buffer. The failure can also occur on smaller systems when memory fragmentation is high as reported by Dave Jones. Convert /proc/timer_list to a proper seq_file with its own iterator. This is a little more complex given that we have to make two passes with two separate headers. sysrq_timer_list_show also needed to be updated to reflect the fact that now timer_list_show only does one cpu at at time. Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Reported-by: Dave Jones <davej@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1364345790-14577-3-git-send-email-nzimmer@sgi.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-17timer_list: Split timer_list_show_tickdevicesNathan Zimmer
Split timer_list_show_tickdevices() into the header printout and pull the rest up to timer_list_show. This is a preparatory patch for converting timer_list to a proper seqfile with its own iterator Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Reported-by: Dave Jones <davej@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1364345790-14577-2-git-send-email-nzimmer@sgi.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-15nohz: Improve a bit the full dynticks Kconfig documentationFrederic Weisbecker
Remove the "single task" statement from CONFIG_NO_HZ_FULL title. The constraint can be invalidated when tasks from other sched classes than SCHED_FAIR are running. Moreover it's possible that hrtick join the party in the future. Also add a line about the dependency on SMP. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-15nohz: Align periodic tick Kconfig with other choices' naming conventionFrederic Weisbecker
Rename CONFIG_PERIODIC_HZ to CONFIG_HZ_PERIODIC in order to stay consistent with other tick implementation entries: CONFIG_HZ_PERIODIC CONFIG_NO_HZ_IDLE CONFIG_NO_HZ_FULL Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-15nohz: Switch from "extended nohz" to "full nohz" based namingFrederic Weisbecker
"Extended nohz" was used as a naming base for the full dynticks API and Kconfig symbols. It reflects the fact the system tries to stop the tick in more places than just idle. But that "extended" name is a bit opaque and vague. Rename it to "full" makes it clearer what the system tries to do under this config: try to shutdown the tick anytime it can. The various constraints that prevent that to happen shouldn't be considered as fundamental properties of this feature but rather technical issues that may be solved in the future. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-15nohz: Fix old dynticks idle Kconfig backward compatibilityFrederic Weisbecker
In order to enforce backward compatibility with older config files, we want the new dynticks-idle Kconfig entry to default its value to the one of the old CONFIG_NO_HZ symbol if present. Namely we want: config NO_HZ # old obsolete dynticks idle symbol bool config NO_HZ_IDLE # new dynticks idle symbol default NO_HZ However Kconfig prevents this to work if the old symbol is not visible. And this is currently the case because NO_HZ lacks a title in order to show it in make oldconfig and alike. To fix this, bring a minimal title and help text to the obsolete Kconfig entry that explains its purpose. This makes the "defaulting" to work. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-11timekeeping: Make sure to notify hrtimers when TAI offset changesJohn Stultz
Now that we have CLOCK_TAI timers, make sure we notify hrtimer code when TAI offset is changed. Signed-off-by: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1365622909-953-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-04timekeeping: Shorten seq_count regionThomas Gleixner
Shorten the seqcount write hold region to the actual update of the timekeeper and the related data (e.g vsyscall). On a contemporary x86 system this reduces the maximum latencies on Preempt-RT from 8us to 4us on the non-timekeeping cores. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04timekeeping: Implement a shadow timekeeperThomas Gleixner
Use the shadow timekeeper to do the update_wall_time() adjustments and then copy it over to the real timekeeper. Keep the shadow timekeeper in sync when updating stuff outside of update_wall_time(). This allows us to limit the timekeeper_seq hold time to the update of the real timekeeper and the vsyscall data in the next patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04timekeeping: Delay update of clock->cycle_lastThomas Gleixner
For calculating the new timekeeper values store the new cycle_last value in the timekeeper and update the clock->cycle_last just when we actually update the new values. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04timekeeping: Store cycle_last value in timekeeper struct as wellThomas Gleixner
For implementing a shadow timekeeper and a split calculation/update region we need to store the cycle_last value in the timekeeper and update the value in the clocksource struct only in the update region. Add the extra storage to the timekeeper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04ntp: Remove ntp_lock, using the timekeeping locks to protect ntp stateJohn Stultz
In order to properly handle the NTP state in future changes to the timekeeping lock management, this patch moves the management of all of the ntp state under the timekeeping locks. This allows us to remove the ntp_lock. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04timekeeping: Simplify tai updating from do_adjtimexJohn Stultz
Since we are taking the timekeeping locks, just go ahead and update any tai change directly, rather then dropping the lock and calling a function that will just take it again. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04timekeeping: Hold timekeepering locks in do_adjtimex and hardppsJohn Stultz
In moving the NTP state to be protected by the timekeeping locks, be sure to acquire the timekeeping locks prior to calling ntp functions. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()John Stultz
Since ADJ_SETOFFSET adjusts the timekeeping state, process it as part of the top level do_adjtimex() function in timekeeping.c. This avoids deadlocks that could occur once we change the ntp locking rules. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04ntp: Rework do_adjtimex to take timespec and tai argumentsJohn Stultz
In order to change the locking rules, we need to provide the timespec and tai values rather then having the ntp logic acquire these values itself. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04ntp: Move timex validation to timekeeping do_adjtimex call.John Stultz
Move logic that does not need the ntp state to be done in the timekeeping do_adjtimex() call. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04ntp: Move do_adjtimex() and hardpps() functions to timekeeping.cJohn Stultz
In preparation for changing the ntp locking rules, move do_adjtimex and hardpps accessor functions to timekeeping.c, but keep the code logic in ntp.c. This patch also introduces a ntp_internal.h file so timekeeping specific interfaces of ntp.c can be more limitedly shared with timekeeping.c. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-04ntp: Split out timex validation from do_adjtimexJohn Stultz
Split out the timex validation done in do_adjtimex into a separate function. This will help simplify logic in following patches. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-04-03nohz: Print final full dynticks CPUs range on bootFrederic Weisbecker
Given that we apply a few restrictions on the full dynticks CPUs range (keep an online timekeeper oustide the range, then in the future have the range be an RCU nocb CPUs subset), let's print the final resulting range of full dynticks CPUs to the user so that he knows what's really going to run. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-03nohz: Pack nohz Kconfig option in a menu of choicesFrederic Weisbecker
Now the user has the choice between three implementations of the timer tick: * Static periodic tick * Idle dynticks * Full dynticks At least for now, these are mutually exclusive choices, so let's rely on the proper Kconfig feature to display these to the user. A new entry CONFIG_NO_HZ_IDLE is created and the old CONFIG_NO_HZ maps to it for config file backward compatibility. The old name was too general now that we have more granular dynticks implementations. While at it, add some explanation to help the user on his decision between the 3 entries. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-03nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMONFrederic Weisbecker
We are planning to convert the dynticks Kconfig options layout into a choice menu. The user must be able to easily pick any of the following implementations: constant periodic tick, idle dynticks, full dynticks. As this implies a mutual exclusion, the two dynticks implementions need to converge on the selection of a common Kconfig option in order to ease the sharing of a common infrastructure. It would thus seem pretty natural to reuse CONFIG_NO_HZ to that end. It already implements all the idle dynticks code and the full dynticks depends on all that code for now. So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ. On the other hand we want to stay backward compatible: if CONFIG_NO_HZ is set in an older config file, we want to enable CONFIG_NO_HZ_IDLE by default. But we can't afford both at the same time or we run into a circular dependency: 1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select CONFIG_NO_HZ 2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE We might be able to support that from Kconfig/Kbuild but it may not be wise to introduce such a confusing behaviour. So to solve this, create a new CONFIG_NO_HZ_COMMON option which gathers the common code between idle and full dynticks (that common code for now is simply the idle dynticks code) and select it from their referring Kconfig. Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ to it for backward compatibility. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-03Merge branch 'fortglx/3.10/time' of ↵Thomas Gleixner
git://git.linaro.org/people/jstultz/linux into timers/core
2013-04-02nohz: Unhide full dynticks feature from its dependenciesFrederic Weisbecker
The full dynticks feature only shows up when all its Kconfig dependencies are met (RCU nocbs, RCU user mode, ...) This is far from being user friendly as those who want to activate this feature need to look into the Kconfig files and iterate through each dependency then activate these by hand in order to show and select the full dynticks Kconfig option. So process the other way around: show up the Kconfig option if the minimal low level dependencies are met and activate the high level ones when we enable the feature. Note there is one exception in the picture: CONFIG_VIRT_CPU_ACCOUNTING_GEN is part of a Kconfig choice menu and it appears we can't select it from another Kconfig selection when it's under such layout. So for now this particular item stays as a passive dependency. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-03-25timekeeping: __timekeeping_set_tai_offset can be staticFengguang Wu
Yet again, the kbuild test robot saves the day, noting I left out defining __timekeeping_set_tai_offset as static. It even sent me this patch. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-25tick: Change log level of NOHZ: local_softirq_pending messageRado Vrbovsky
The "NOHZ: local_softirq_pending" message is a largely informational message. This makes extra work for customers that have a policy of investigating all kernel log messages logged at <= KERN_ERR log level. This patch sets the message to a different log level. [ tglx: Use pr_warn() ] Signed-off-by: Rado Vrbovsky <rvrbovsk@redhat.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/2037057938.893524.1360345050772.JavaMail.root@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-03-22timekeeping: Split timekeeper_lock into lock and seqcountThomas Gleixner
We want to shorten the seqcount write hold time. So split the seqlock into a lock and a seqcount. Open code the seqwrite_lock in the places which matter and drop the sequence counter update where it's pointless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [jstultz: Merge fixups from CLOCK_TAI collisions] Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-22timekeeping: Move lock out of timekeeper structThomas Gleixner
Make the lock a separate entity. Preparatory patch for shadow timekeeper structure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [Merged with CLOCK_TAI changes] Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-22timekeeping: Make jiffies_lock internalThomas Gleixner
Nothing outside of the timekeeping core needs that lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-22timekeeping: Calc stuff onceThomas Gleixner
Calculate the cycle interval shifted value once. No functional change, just makes the code more readable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-22hrtimer: Add hrtimer support for CLOCK_TAIJohn Stultz
Add hrtimer support for CLOCK_TAI, as well as posix timer interfaces. Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-22timekeeping: Add CLOCK_TAI clockidJohn Stultz
This add a CLOCK_TAI clockid and the needed accessors. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-22timekeeping: Move TAI managment into timekeeping core from ntpJohn Stultz
Currently NTP manages the TAI offset. Since there's plans for a CLOCK_TAI clockid, push the TAI management into the timekeeping core. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-21nohz: Assign timekeeping duty to a CPU outside the full dynticks rangeFrederic Weisbecker
This way the full nohz CPUs can safely run with the tick stopped with a guarantee that somebody else is taking care of the jiffies and GTOD progression. Once the duty is attributed to a CPU, it won't change. Also that CPU can't enter into dyntick idle mode or be hot unplugged. This may later be improved from a power consumption POV. At least we should be able to share the duty amongst all CPUs outside the full dynticks range. Then the duty could even be shared with full dynticks CPUs when those can't stop their tick for any reason. But let's start with that very simple approach first. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> [fix have_nohz_full_mask offcase] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-21nohz: Basic full dynticks interfaceFrederic Weisbecker
For extreme usecases such as Real Time or HPC, having the ability to shutdown the tick when a single task runs on a CPU is a desired feature: * Reducing the amount of interrupts improves throughput for CPU-bound tasks. The CPU is less distracted from its real job, from an execution time and from the cache point of views. * This also improve latency response as we have less critical sections. Start with introducing a very simple interface to define full dynticks CPU: use a boot time option defined cpumask through the "nohz_extended=" kernel parameter. CPUs that are part of this range will have their tick shutdown whenever possible: provided they run a single task and they don't do kernel activity that require the periodic tick. These details will be later documented in Documentation/* An online CPU must be kept outside this range to handle the timekeeping. Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-03-15timekeeping: utilize the suspend-nonstop clocksource to count suspended timeFeng Tang
There are some new processors whose TSC clocksource won't stop during suspend. Currently, after system resumes, kernel will use persistent clock or RTC to compensate the sleep time, but with these nonstop clocksources, we could skip the special compensation from external sources, and just use current clocksource for time recounting. This can solve some time drift bugs caused by some not-so-accurate or error-prone RTC devices. The current way to count suspended time is first try to use the persistent clock, and then try the RTC if persistent clock can't be used. This patch will change the trying order to: suspend-nonstop clocksource -> persistent clock -> RTC When counting the sleep time with nonstop clocksource, use an accurate way suggested by Jason Gunthorpe to cover very large delta cycles. Signed-off-by: Feng Tang <feng.tang@intel.com> [jstultz: Small optimization, avoiding re-reading the clocksource] Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-13tick: Provide a check for a forced broadcast pendingThomas Gleixner
On the CPU which gets woken along with the target CPU of the broadcast the following happens: deep_idle() <-- spurious wakeup broadcast_exit() set forced bit enable interrupts <-- Nothing happens disable interrupts broadcast_enter() <-- Here we observe the forced bit is set deep_idle() Now after that the target CPU of the broadcast runs the broadcast handler and finds the other CPU in both the broadcast and the forced mask, sends the IPI and stuff gets back to normal. So it's not actually harmful, just more evidence for the theory, that hardware designers have access to very special drug supplies. Now there is no point in going back to deep idle just to wake up again right away via an IPI. Provide a check which allows the idle code to avoid the deep idle transition. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Jason Liu <liu.h.jason@gmail.com> Link: http://lkml.kernel.org/r/20130306111537.565418308@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-03-13tick: Handle broadcast wakeup of multiple cpusThomas Gleixner
Some brilliant hardware implementations wake multiple cores when the broadcast timer fires. This leads to the following interesting problem: CPU0 CPU1 wakeup from idle wakeup from idle leave broadcast mode leave broadcast mode restart per cpu timer restart per cpu timer go back to idle handle broadcast (empty mask) enter broadcast mode programm broadcast device enter broadcast mode programm broadcast device So what happens is that due to the forced reprogramming of the cpu local timer, we need to set a event in the future. Now if we manage to go back to idle before the timer fires, we switch off the timer and arm the broadcast device with an already expired time (covered by forced mode). So in the worst case we repeat the above ping pong forever. Unfortunately we have no information about what caused the wakeup, but we can check current time against the expiry time of the local cpu. If the local event is already in the past, we know that the broadcast timer is about to fire and send an IPI. So we mark ourself as an IPI target even if we left broadcast mode and avoid the reprogramming of the local cpu timer. This still leaves the possibility that a CPU which is not handling the broadcast interrupt is going to reach idle again before the IPI arrives. This can't be solved in the core code and will be handled in follow up patches. Reported-by: Jason Liu <liu.h.jason@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Link: http://lkml.kernel.org/r/20130306111537.492045206@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-03-13tick: Avoid programming the local cpu timer if broadcast pendingThomas Gleixner
If the local cpu timer stops in deep idle, we arm the broadcast device and get woken by an IPI. Now when we return from deep idle we reenable the local cpu timer unconditionally before handling the IPI. But that's a pointless exercise: the timer is already expired and the IPI is on the way. And it's an expensive exercise as we use the forced reprogramming mode so that we do not lose a timer event. This forced reprogramming will loop at least once in the retry. To avoid this reprogramming, we mark the cpu in a pending bit mask before we send the IPI. Now when the IPI target cpu wakes up, it will see the pending bit set and skip the reprogramming. The reprogramming of the cpu local timer will happen in the IPI handler which runs the cpu local timer interrupt function. Reported-by: Jason Liu <liu.h.jason@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Link: http://lkml.kernel.org/r/20130306111537.431082074@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-03-07clockevents: Don't allow dummy broadcast timersMark Rutland
Currently tick_check_broadcast_device doesn't reject clock_event_devices with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real hardware if they have a higher rating value. In this situation, the dummy timer is responsible for broadcasting to itself, and the core clockevents code may attempt to call non-existent callbacks for programming the dummy, eventually leading to a panic. This patch makes tick_check_broadcast_device always reject dummy timers, preventing this problem. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Jon Medhurst (Tixy) <tixy@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-03-07tick: Dynamically set broadcast irq affinityDaniel Lezcano
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead. Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all. So in the worst case an idle CPU will wake up to send an IPI to the CPU whose timer expired. Provide an opt-in feature CLOCK_EVT_FEAT_DYNIRQ which tells the core that is should set the interrupt affinity of the broadcast interrupt to the cpu which has the earliest expiry time. This avoids unnecessary spurious wakeups and IPIs. [ tglx: Adopted to cpumask rework, silenced an uninitialized warning, massaged changelog ] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: viresh.kumar@linaro.org Cc: jacob.jun.pan@linux.intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: santosh.shilimkar@ti.com Cc: linaro-kernel@lists.linaro.org Cc: patches@linaro.org Cc: rickard.andersson@stericsson.com Cc: vincent.guittot@linaro.org Cc: linus.walleij@stericsson.com Cc: john.stultz@linaro.org Link: http://lkml.kernel.org/r/1362219013-18173-3-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-03-07tick: Pass broadcast device to tick_broadcast_set_event()Daniel Lezcano
Pass the broadcast timer to tick_broadcast_set_event() instead of reevaluating tick_broadcast_device.evtdev. [ tglx: Massaged changelog ] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: viresh.kumar@linaro.org Cc: jacob.jun.pan@linux.intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: santosh.shilimkar@ti.com Cc: linaro-kernel@lists.linaro.org Cc: patches@linaro.org Cc: rickard.andersson@stericsson.com Cc: vincent.guittot@linaro.org Cc: linus.walleij@stericsson.com Cc: john.stultz@linaro.org Link: http://lkml.kernel.org/r/1362219013-18173-2-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-03-07tick: Convert broadcast cpu bitmaps to cpumask_var_tThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130306111537.366394000@linutronix.de Cc: Rusty Russell <rusty@rustcorp.com.au>
2013-02-28Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: "Highlights: - introduction of Dove thermal sensor driver. - introduction of Kirkwood thermal sensor driver. - introduction of intel_powerclamp thermal cooling device driver. - add interrupt and DT support for rcar thermal driver. - add thermal emulation support which allows platform thermal driver to do software/hardware emulation for thermal issues." * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (36 commits) thermal: rcar: remove __devinitconst thermal: return an error on failure to register thermal class Thermal: rename thermal governor Kconfig option to avoid generic naming thermal: exynos: Use the new thermal trend type for quick cooling action. Thermal: exynos: Add support for temperature falling interrupt. Thermal: Dove: Add Themal sensor support for Dove. thermal: Add support for the thermal sensor on Kirkwood SoCs thermal: rcar: add Device Tree support thermal: rcar: remove machine_power_off() from rcar_thermal_notify() thermal: rcar: add interrupt support thermal: rcar: add read/write functions for common/priv data thermal: rcar: multi channel support thermal: rcar: use mutex lock instead of spin lock thermal: rcar: enable CPCTL to use hardware TSC deciding thermal: rcar: use parenthesis on macro Thermal: fix a build warning when CONFIG_THERMAL_EMULATION cleared Thermal: fix a wrong comment thermal: sysfs: Add a new sysfs node emul_temp for thermal emulation PM: intel_powerclamp: off by one in start_power_clamp() thermal: exynos: Miscellaneous fixes to support falling threshold interrupt ...
2013-02-22Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest change is the rwsem lock-steal improvements, both to the assembly optimized and the spinlock based variants. The other notable change is the clean up of the seqlock implementation to be based on the seqcount infrastructure. The rest is assorted smaller debuggability, cleanup and continued -rt locking changes." * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rwsem-spinlock: Implement writer lock-stealing for better scalability futex: Revert "futex: Mark get_robust_list as deprecated" generic: Use raw local irq variant for generic cmpxchg lockdep: Selftest: convert spinlock to raw spinlock seqlock: Use seqcount infrastructure seqlock: Remove unused functions ntp: Make ntp_lock raw intel_idle: Convert i7300_idle_lock to raw_spinlock locking: Various static lock initializer fixes lockdep: Print more info when MAX_LOCK_DEPTH is exceeded rwsem: Implement writer lock-stealing for better scalability lockdep: Silence warning if CONFIG_LOCKDEP isn't set watchdog: Use local_clock for get_timestamp() lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug() locking/stat: Fix a typo
2013-02-21Merge tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Pull ARM SoC cleanups from Arnd Bergmann: "A large number of cleanups, all over the platforms. This is dominated largely by the Samsung platforms (s3c, s5p, exynos) and a few of the others moving code out of arch/arm into more appropriate subsystems. The clocksource and irqchip drivers are now abstracted to the point where platforms that are already cleaned up do not need to even specify the driver they use, it can all get configured from the device tree as we do for normal device drivers. The clocksource changes basically touch every single platform in the process. We further clean up the use of platform specific header files here, with the goal of turning more of the platforms over to being "multiplatform" enabled, which implies that they cannot expose their headers to architecture independent code any more. It is expected that no functional changes are part of the cleanup. The overall reduction in total code lines is mostly the result of removing broken and obsolete code." * tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (133 commits) ARM: mvebu: correct gated clock documentation ARM: kirkwood: add missing include for nsa310 ARM: exynos: move exynos4210-combiner to drivers/irqchip mfd: db8500-prcmu: update resource passing drivers/db8500-cpufreq: delete dangling include ARM: at91: remove NEOCORE 926 board sunxi: Cleanup the reset code and add meaningful registers defines ARM: S3C24XX: header mach/regs-mem.h local ARM: S3C24XX: header mach/regs-power.h local ARM: S3C24XX: header mach/regs-s3c2412-mem.h local ARM: S3C24XX: Remove plat-s3c24xx directory in arch/arm/ ARM: S3C24XX: transform s3c2443 subirqs into new structure ARM: S3C24XX: modify s3c2443 irq init to initialize all irqs ARM: S3C24XX: move s3c2443 irq code to irq.c ARM: S3C24XX: transform s3c2416 irqs into new structure ARM: S3C24XX: modify s3c2416 irq init to initialize all irqs ARM: S3C24XX: move s3c2416 irq init to common irq code ARM: S3C24XX: Modify s3c_irq_wake to use the hwirq property ARM: S3C24XX: Move irq syscore-ops to irq-pm clocksource: always define CLOCKSOURCE_OF_DECLARE ...
2013-02-19Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Ingo Molnar: "Main changes: - ntp: Add CONFIG_RTC_SYSTOHC: a generic RTC driver facility complementing the existing CONFIG_RTC_HCTOSYS, which uses NTP to keep the hardware clock updated. - posix-timers: Fix clock_adjtime to always return timex data on success. This is changing the ABI, but no breakage was expected and found - caution is warranted nevertheless. - platform persistent clock improvements/cleanups. - clockevents: refactor timer broadcast handling to be more generic and less duplicated with matching architecture code (mostly ARM motivated.) - various fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet() posix-cpu-timers: Fix nanosleep task_struct leak clockevents: Fix generic broadcast for FEAT_C3STOP time, Fix setting of hardware clock in NTP code hrtimer: Prevent hrtimer_enqueue_reprogram race clockevents: Add generic timer broadcast function clockevents: Add generic timer broadcast receiver timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK x86/time/rtc: Don't print extended CMOS year when reading RTC x86: Select HAS_PERSISTENT_CLOCK on x86 timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option rtc: Skip the suspend/resume handling if persistent clock exist timekeeping: Add persistent_clock_exist flag posix-timers: Fix clock_adjtime to always return timex data on success Round the calculated scale factor in set_cyc2ns_scale() NTP: Add a CONFIG_RTC_SYSTOHC configuration MAINTAINERS: Update John Stultz's email time: create __getnstimeofday for WARNless calls