From f005bd7e3b84a353475a2895e2c7686a66297d87 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 1 Aug 2016 10:54:15 +0100 Subject: clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered The ARM architected timer produces level-triggered interrupts (this is mandated by the architecture). Unfortunately, a number of device-trees get this wrong, and expose an edge-triggered interrupt. Until now, this wasn't too much an issue, as the programming of the trigger would fail (the corresponding PPI cannot be reconfigured), and the kernel would be happy with this. But we're about to change this, and trust DT a lot if the driver doesn't provide its own trigger information. In that context, the timer breaks badly. While we do need to fix the DTs, there is also some userspace out there (kvmtool) that generates the same kind of broken DT on the fly, and that will completely break with newer kernels. As a safety measure, and to keep buggy software alive as well as buying us some time to fix DTs all over the place, let's check what trigger configuration has been given us by the firmware. If this is not a level configuration, then we know that the DT/ACPI configuration is bust, and we pick some defaults which won't be worse than the existing setup. Signed-off-by: Marc Zyngier Cc: Andrew Lunn Cc: Liu Gang Cc: Mark Rutland Cc: Masahiro Yamada Cc: Wenbin Song Cc: Mingkai Hu Cc: Florian Fainelli Cc: Kevin Hilman Cc: Daniel Lezcano Cc: Michal Simek Cc: Jon Hunter Cc: arm@kernel.org Cc: bcm-kernel-feedback-list@broadcom.com Cc: linux-arm-kernel@lists.infradead.org Cc: Sebastian Hesselbarth Cc: Jason Cooper Cc: Ray Jui Cc: "Hou Zhiqiang" Cc: Tirumalesh Chalamarla Cc: linux-samsung-soc@vger.kernel.org Cc: Yuan Yao Cc: Jan Glauber Cc: Gregory Clement Cc: linux-amlogic@lists.infradead.org Cc: soren.brinkmann@xilinx.com Cc: Rajesh Bhagat Cc: Scott Branden Cc: Duc Dang Cc: Kukjin Kim Cc: Carlo Caione Cc: Dinh Nguyen Link: http://lkml.kernel.org/r/1470045256-9032-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner --- drivers/clocksource/arm_arch_timer.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index 28bce3f4f81d..57700541f951 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -8,6 +8,9 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ + +#define pr_fmt(fmt) "arm_arch_timer: " fmt + #include #include #include @@ -370,16 +373,33 @@ static bool arch_timer_has_nonsecure_ppi(void) arch_timer_ppi[PHYS_NONSECURE_PPI]); } +static u32 check_ppi_trigger(int irq) +{ + u32 flags = irq_get_trigger_type(irq); + + if (flags != IRQF_TRIGGER_HIGH && flags != IRQF_TRIGGER_LOW) { + pr_warn("WARNING: Invalid trigger for IRQ%d, assuming level low\n", irq); + pr_warn("WARNING: Please fix your firmware\n"); + flags = IRQF_TRIGGER_LOW; + } + + return flags; +} + static int arch_timer_starting_cpu(unsigned int cpu) { struct clock_event_device *clk = this_cpu_ptr(arch_timer_evt); + u32 flags; __arch_timer_setup(ARCH_CP15_TIMER, clk); - enable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], 0); + flags = check_ppi_trigger(arch_timer_ppi[arch_timer_uses_ppi]); + enable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], flags); - if (arch_timer_has_nonsecure_ppi()) - enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], 0); + if (arch_timer_has_nonsecure_ppi()) { + flags = check_ppi_trigger(arch_timer_ppi[PHYS_NONSECURE_PPI]); + enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], flags); + } arch_counter_set_user_access(); if (evtstrm_enable) -- cgit v1.2.3 From 46c8f0b077a838eb1f6169bb370aab8ed98f7630 Mon Sep 17 00:00:00 2001 From: Chris Metcalf Date: Mon, 8 Aug 2016 16:29:07 -0400 Subject: timers: Fix get_next_timer_interrupt() computation The tick_nohz_stop_sched_tick() routine is not properly canceling the sched timer when nothing is pending, because get_next_timer_interrupt() is no longer returning KTIME_MAX in that case. This causes periodic interrupts when none are needed. When determining the next interrupt time, we first use __next_timer_interrupt() to get the first expiring timer in the timer wheel. If no timer is found, we return the base clock value plus NEXT_TIMER_MAX_DELTA to indicate there is no timer in the timer wheel. Back in get_next_timer_interrupt(), we set the "expires" value by converting the timer wheel expiry (in ticks) to a nsec value. But we don't want to do this if the timer wheel expiry value indicates no timer; we want to return KTIME_MAX. Prior to commit 500462a9de65 ("timers: Switch to a non-cascading wheel") we checked base->active_timers to see if any timers were active, and if not, we didn't touch the expiry value and so properly returned KTIME_MAX. Now we don't have active_timers. To fix this, we now just check the timer wheel expiry value to see if it is "now + NEXT_TIMER_MAX_DELTA", and if it is, we don't try to compute a new value based on it, but instead simply let the KTIME_MAX value in expires remain. Fixes: 500462a9de65 "timers: Switch to a non-cascading wheel" Signed-off-by: Chris Metcalf Cc: Frederic Weisbecker Cc: Christoph Lameter Cc: John Stultz Link: http://lkml.kernel.org/r/1470688147-22287-1-git-send-email-cmetcalf@mellanox.com Signed-off-by: Thomas Gleixner --- kernel/time/timer.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 555670a5143c..32bf6f75a8fe 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1496,6 +1496,7 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); u64 expires = KTIME_MAX; unsigned long nextevt; + bool is_max_delta; /* * Pretend that there is no timer pending if the cpu is offline. @@ -1506,6 +1507,7 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) spin_lock(&base->lock); nextevt = __next_timer_interrupt(base); + is_max_delta = (nextevt == base->clk + NEXT_TIMER_MAX_DELTA); base->next_expiry = nextevt; /* * We have a fresh next event. Check whether we can forward the base: @@ -1519,7 +1521,8 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) expires = basem; base->is_idle = false; } else { - expires = basem + (nextevt - basej) * TICK_NSEC; + if (!is_max_delta) + expires = basem + (nextevt - basej) * TICK_NSEC; /* * If we expect to sleep more than a tick, mark the base idle: */ -- cgit v1.2.3 From 1a9e4c564ab174e53ed86def922804a5ddc63e7d Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Thu, 14 Jul 2016 17:22:54 +0200 Subject: x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error I noticed the following bug/misbehavior on certain Intel systems: with a single task running on a NOHZ CPU on an Intel Haswell, I recognized that I did not only get the one expected local_timer APIC interrupt, but two per second at minimum. (!) Further tracing showed that the first one precedes the programmed deadline by up to ~50us and hence, it did nothing except for reprogramming the TSC deadline clockevent device to trigger shortly thereafter again. The reason for this is imprecise calibration, the timeout we program into the APIC results in 'too short' timer interrupts. The core (hr)timer code notices this (because it has a precise ktime source and sees the short interrupt) and fixes it up by programming an additional very short interrupt period. This is obviously suboptimal. The reason for the imprecise calibration is twofold, and this patch fixes the first reason: In setup_APIC_timer(), the registered clockevent device's frequency is calculated by first dividing tsc_khz by TSC_DIVISOR and multiplying it with 1000 afterwards: (tsc_khz / TSC_DIVISOR) * 1000 The multiplication with 1000 is done for converting from kHz to Hz and the division by TSC_DIVISOR is carried out in order to make sure that the final result fits into an u32. However, with the order given in this calculation, the roundoff error introduced by the division gets magnified by a factor of 1000 by the following multiplication. To fix it, reversing the order of the division and the multiplication a la: (tsc_khz * 1000) / TSC_DIVISOR ... reduces the roundoff error already. Furthermore, if TSC_DIVISOR divides 1000, associativity holds: (tsc_khz * 1000) / TSC_DIVISOR = tsc_khz * (1000 / TSC_DIVISOR) and thus, the roundoff error even vanishes and the whole operation can be carried out within 32 bits. The powers of two that divide 1000 are 2, 4 and 8. A value of 8 for TSC_DIVISOR still allows for TSC frequencies up to 2^32 / 10^9ns * 8 = 34.4GHz which is way larger than anything to expect in the next years. Thus we also replace the current TSC_DIVISOR value of 32 by 8. Reverse the order of the divison and the multiplication in the calculation of the registered clockevent device's frequency. Signed-off-by: Nicolai Stange Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Paolo Bonzini Acked-by: Thomas Gleixner Cc: Adrian Hunter Cc: Borislav Petkov Cc: Christopher S. Hall Cc: H. Peter Anvin Cc: Hidehiro Kawai Cc: Len Brown Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Viresh Kumar Link: http://lkml.kernel.org/r/20160714152255.18295-2-nicstange@gmail.com [ Improved changelog. ] Signed-off-by: Ingo Molnar --- arch/x86/kernel/apic/apic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index ac8d8ad8b009..a315dc404756 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -313,7 +313,7 @@ int lapic_get_maxlvt(void) /* Clock divisor */ #define APIC_DIVISOR 16 -#define TSC_DIVISOR 32 +#define TSC_DIVISOR 8 /* * This function sets up the local APIC timer, with a timeout of @@ -565,7 +565,7 @@ static void setup_APIC_timer(void) CLOCK_EVT_FEAT_DUMMY); levt->set_next_event = lapic_next_deadline; clockevents_config_and_register(levt, - (tsc_khz / TSC_DIVISOR) * 1000, + tsc_khz * (1000 / TSC_DIVISOR), 0xF, ~0UL); } else clockevents_register_device(levt); -- cgit v1.2.3 From 6731b0d611a1274f9e785fa0189ac2aeeabd0591 Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Thu, 14 Jul 2016 17:22:55 +0200 Subject: x86/timers/apic: Inform TSC deadline clockevent device about recalibration This patch eliminates a source of imprecise APIC timer interrupts, which imprecision may result in double interrupts or even late interrupts. The TSC deadline clockevent devices' configuration and registration happens before the TSC frequency calibration is refined in tsc_refine_calibration_work(). This results in the TSC clocksource and the TSC deadline clockevent devices being configured with slightly different frequencies: the former gets the refined one and the latter are configured with the inaccurate frequency detected earlier by means of the "Fast TSC calibration using PIT". Within the APIC code, introduce the notifier function lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline clockevent devices with the current tsc_khz. Call it from the TSC code after TSC calibration refinement has happened. Signed-off-by: Nicolai Stange Signed-off-by: Peter Zijlstra (Intel) Acked-by: Thomas Gleixner Cc: Adrian Hunter Cc: Borislav Petkov Cc: Christopher S. Hall Cc: H. Peter Anvin Cc: Hidehiro Kawai Cc: Len Brown Cc: Linus Torvalds Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Viresh Kumar Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com [ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ] Signed-off-by: Ingo Molnar --- arch/x86/include/asm/apic.h | 2 ++ arch/x86/kernel/apic/apic.c | 24 ++++++++++++++++++++++++ arch/x86/kernel/tsc.c | 4 ++++ 3 files changed, 30 insertions(+) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index f5befd4945f2..124357773ffa 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -135,6 +135,7 @@ extern void init_apic_mappings(void); void register_lapic_address(unsigned long address); extern void setup_boot_APIC_clock(void); extern void setup_secondary_APIC_clock(void); +extern void lapic_update_tsc_freq(void); extern int APIC_init_uniprocessor(void); #ifdef CONFIG_X86_64 @@ -170,6 +171,7 @@ static inline void init_apic_mappings(void) { } static inline void disable_local_APIC(void) { } # define setup_boot_APIC_clock x86_init_noop # define setup_secondary_APIC_clock x86_init_noop +static inline void lapic_update_tsc_freq(void) { } #endif /* !CONFIG_X86_LOCAL_APIC */ #ifdef CONFIG_X86_X2APIC diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index a315dc404756..0fd3d659f13c 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -571,6 +571,30 @@ static void setup_APIC_timer(void) clockevents_register_device(levt); } +/* + * Install the updated TSC frequency from recalibration at the TSC + * deadline clockevent devices. + */ +static void __lapic_update_tsc_freq(void *info) +{ + struct clock_event_device *levt = this_cpu_ptr(&lapic_events); + + if (!this_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) + return; + + clockevents_update_freq(levt, tsc_khz * (1000 / TSC_DIVISOR)); +} + +void lapic_update_tsc_freq(void) +{ + /* + * The clockevent device's ->mult and ->shift can both be + * changed. In order to avoid races, schedule the frequency + * update code on each CPU. + */ + on_each_cpu(__lapic_update_tsc_freq, NULL, 0); +} + /* * In this functions we calibrate APIC bus clocks to the external timer. * diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a804b5ab32d0..8fb4b6abac0e 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -22,6 +22,7 @@ #include #include #include +#include unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */ EXPORT_SYMBOL(cpu_khz); @@ -1249,6 +1250,9 @@ static void tsc_refine_calibration_work(struct work_struct *work) (unsigned long)tsc_khz / 1000, (unsigned long)tsc_khz % 1000); + /* Inform the TSC deadline clockevent devices about the recalibration */ + lapic_update_tsc_freq(); + out: if (boot_cpu_has(X86_FEATURE_ART)) art_related_clocksource = &clocksource_tsc; -- cgit v1.2.3 From 22cc1ca3c5469cf17e149be232817b9223afa5e4 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 9 Aug 2016 21:54:53 +0200 Subject: x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Ville Syrjälä reports "The first time I run hwclock after rebooting I get this: open("/dev/rtc", O_RDONLY) = 3 ioctl(3, PHN_SET_REGS or RTC_UIE_ON, 0) = 0 select(4, [3], NULL, NULL, {10, 0}) = 0 (Timeout) ioctl(3, PHN_NOT_OH or RTC_UIE_OFF, 0) = 0 close(3) = 0 On all subsequent runs I get this: open("/dev/rtc", O_RDONLY) = 3 ioctl(3, PHN_SET_REGS or RTC_UIE_ON, 0) = -1 EINVAL (Invalid argument) ioctl(3, RTC_RD_TIME, 0x7ffd76b3ae70) = -1 EINVAL (Invalid argument) close(3) = 0" This was caused by a stupid typo in a patch that should have been a simple rename to move around contents of a header file, but accidentally wrote zeroes into the rtc rather than reading from it: 463a86304cae ("char/genrtc: x86: remove remnants of asm/rtc.h") Reported-by: Ville Syrjälä Tested-by: Jarkko Nikula Tested-by: Ville Syrjälä Signed-off-by: Arnd Bergmann Cc: Alessandro Zummo Cc: Alexandre Belloni Cc: Borislav Petkov Cc: Geert Uytterhoeven Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: rtc-linux@googlegroups.com Fixes: 463a86304cae ("char/genrtc: x86: remove remnants of asm/rtc.h") Link: http://lkml.kernel.org/r/20160809195528.1604312-1-arnd@arndb.de Signed-off-by: Ingo Molnar --- arch/x86/kernel/hpet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index ed16e58658a4..c6dfd801df97 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1242,7 +1242,7 @@ irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id) memset(&curr_time, 0, sizeof(struct rtc_time)); if (hpet_rtc_flags & (RTC_UIE | RTC_AIE)) - mc146818_set_time(&curr_time); + mc146818_get_time(&curr_time); if (hpet_rtc_flags & RTC_UIE && curr_time.tm_sec != hpet_prev_update_sec) { -- cgit v1.2.3