diff options
241 files changed, 5291 insertions, 5555 deletions
diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt new file mode 100644 index 000000000000..b29f03a6d82f --- /dev/null +++ b/Documentation/riscv/pmu.txt @@ -0,0 +1,249 @@ +Supporting PMUs on RISC-V platforms +========================================== +Alan Kao <alankao@andestech.com>, Mar 2018 + +Introduction +------------ + +As of this writing, perf_event-related features mentioned in The RISC-V ISA +Privileged Version 1.10 are as follows: +(please check the manual for more details) + +* [m|s]counteren +* mcycle[h], cycle[h] +* minstret[h], instret[h] +* mhpeventx, mhpcounterx[h] + +With such function set only, porting perf would require a lot of work, due to +the lack of the following general architectural performance monitoring features: + +* Enabling/Disabling counters + Counters are just free-running all the time in our case. +* Interrupt caused by counter overflow + No such feature in the spec. +* Interrupt indicator + It is not possible to have many interrupt ports for all counters, so an + interrupt indicator is required for software to tell which counter has + just overflowed. +* Writing to counters + There will be an SBI to support this since the kernel cannot modify the + counters [1]. Alternatively, some vendor considers to implement + hardware-extension for M-S-U model machines to write counters directly. + +This document aims to provide developers a quick guide on supporting their +PMUs in the kernel. The following sections briefly explain perf' mechanism +and todos. + +You may check previous discussions here [1][2]. Also, it might be helpful +to check the appendix for related kernel structures. + + +1. Initialization +----------------- + +*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains +various methods according to perf's internal convention and PMU-specific +parameters. One should declare such instance to represent the PMU. By default, +*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very +basic support to a baseline QEMU model. + +Then he/she can either assign the instance's pointer to *riscv_pmu* so that +the minimal and already-implemented logic can be leveraged, or invent his/her +own *riscv_init_platform_pmu* implementation. + +In other words, existing sources of *riscv_base_pmu* merely provide a +reference implementation. Developers can flexibly decide how many parts they +can leverage, and in the most extreme case, they can customize every function +according to their needs. + + +2. Event Initialization +----------------------- + +When a user launches a perf command to monitor some events, it is first +interpreted by the userspace perf tool into multiple *perf_event_open* +system calls, and then each of them calls to the body of *event_init* +member function that was assigned in the previous step. In *riscv_base_pmu*'s +case, it is *riscv_event_init*. + +The main purpose of this function is to translate the event provided by user +into bitmap, so that HW-related control registers or counters can directly be +manipulated. The translation is based on the mappings and methods provided in +*riscv_pmu*. + +Note that some features can be done in this stage as well: + +(1) interrupt setting, which is stated in the next section; +(2) privilege level setting (user space only, kernel space only, both); +(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*; +(4) tweaks for non-sampling events, which will be utilized by functions such as +*perf_adjust_period*, usually something like the follows: + +if (!is_sampling_event(event)) { + hwc->sample_period = x86_pmu.max_period; + hwc->last_period = hwc->sample_period; + local64_set(&hwc->period_left, hwc->sample_period); +} + +In the case of *riscv_base_pmu*, only (3) is provided for now. + + +3. Interrupt +------------ + +3.1. Interrupt Initialization + +This often occurs at the beginning of the *event_init* method. In common +practice, this should be a code segment like + +int x86_reserve_hardware(void) +{ + int err = 0; + + if (!atomic_inc_not_zero(&pmc_refcount)) { + mutex_lock(&pmc_reserve_mutex); + if (atomic_read(&pmc_refcount) == 0) { + if (!reserve_pmc_hardware()) + err = -EBUSY; + else + reserve_ds_buffers(); + } + if (!err) + atomic_inc(&pmc_refcount); + mutex_unlock(&pmc_reserve_mutex); + } + + return err; +} + +And the magic is in *reserve_pmc_hardware*, which usually does atomic +operations to make implemented IRQ accessible from some global function pointer. +*release_pmc_hardware* serves the opposite purpose, and it is used in event +destructors mentioned in previous section. + +(Note: From the implementations in all the architectures, the *reserve/release* +pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading. +It does NOT deal with the binding between an event and a physical counter, +which will be introduced in the next section.) + +3.2. IRQ Structure + +Basically, a IRQ runs the following pseudo code: + +for each hardware counter that triggered this overflow + + get the event of this counter + + // following two steps are defined as *read()*, + // check the section Reading/Writing Counters for details. + count the delta value since previous interrupt + update the event->count (# event occurs) by adding delta, and + event->hw.period_left by subtracting delta + + if the event overflows + sample data + set the counter appropriately for the next overflow + + if the event overflows again + too frequently, throttle this event + fi + fi + +end for + +However as of this writing, none of the RISC-V implementations have designed an +interrupt for perf, so the details are to be completed in the future. + +4. Reading/Writing Counters +--------------------------- + +They seem symmetric but perf treats them quite differently. For reading, there +is a *read* interface in *struct pmu*, but it serves more than just reading. +According to the context, the *read* function not only reads the content of the +counter (event->count), but also updates the left period to the next interrupt +(event->hw.period_left). + +But the core of perf does not need direct write to counters. Writing counters +is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one +has to set the counter to a good value for the next interrupt; 2) inside the IRQ +it should set the counter to the same resonable value. + +Reading is not a problem in RISC-V but writing would need some effort, since +counters are not allowed to be written by S-mode. + + +5. add()/del()/start()/stop() +----------------------------- + +Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop() +starts/stop the counter of some event in the PMU. All of them take the same +arguments: *struct perf_event *event* and *int flag*. + +Consider perf as a state machine, then you will find that these functions serve +as the state transition process between those states. +Three states (event->hw.state) are defined: + +* PERF_HES_STOPPED: the counter is stopped +* PERF_HES_UPTODATE: the event->count is up-to-date +* PERF_HES_ARCH: arch-dependent usage ... we don't need this for now + +A normal flow of these state transitions are as follows: + +* A user launches a perf event, resulting in calling to *event_init*. +* When being context-switched in, *add* is called by the perf core, with a flag + PERF_EF_START, which means that the event should be started after it is added. + At this stage, a general event is bound to a physical counter, if any. + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now + stopped, and the (software) event count does not need updating. +** *start* is then called, and the counter is enabled. + With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check + previous section for detail). + Nothing is written if the flag does not contain PERF_EF_RELOAD. + The state now is reset to none, because it is neither stopped nor updated + (the counting already started) +* When being context-switched out, *del* is called. It then checks out all the + events in the PMU and calls *stop* to update their counts. +** *stop* is called by *del* + and the perf core with flag PERF_EF_UPDATE, and it often shares the same + subroutine as *read* with the same logic. + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again. + +** Life cycle of these two pairs: *add* and *del* are called repeatedly as + tasks switch in-and-out; *start* and *stop* is also called when the perf core + needs a quick stop-and-start, for instance, when the interrupt period is being + adjusted. + +Current implementation is sufficient for now and can be easily extended to +features in the future. + +A. Related Structures +--------------------- + +* struct pmu: include/linux/perf_event.h +* struct riscv_pmu: arch/riscv/include/asm/perf_event.h + + Both structures are designed to be read-only. + + *struct pmu* defines some function pointer interfaces, and most of them take +*struct perf_event* as a main argument, dealing with perf events according to +perf's internal state machine (check kernel/events/core.c for details). + + *struct riscv_pmu* defines PMU-specific parameters. The naming follows the +convention of all other architectures. + +* struct perf_event: include/linux/perf_event.h +* struct hw_perf_event + + The generic structure that represents perf events, and the hardware-related +details. + +* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h + + The structure that holds the status of events, has two fixed members: +the number of events and the array of the events. + +References +---------- + +[1] https://github.com/riscv/riscv-linux/pull/124 +[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA diff --git a/MAINTAINERS b/MAINTAINERS index fd3fc63f2759..9d5eeff51b5f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10273,18 +10273,16 @@ F: arch/arm/boot/dts/*am5* F: arch/arm/boot/dts/*dra7* OMAP DISPLAY SUBSYSTEM and FRAMEBUFFER SUPPORT (DSS2) -M: Tomi Valkeinen <tomi.valkeinen@ti.com> L: linux-omap@vger.kernel.org L: linux-fbdev@vger.kernel.org -S: Maintained +S: Orphan F: drivers/video/fbdev/omap2/ F: Documentation/arm/OMAP/DSS OMAP FRAMEBUFFER SUPPORT -M: Tomi Valkeinen <tomi.valkeinen@ti.com> L: linux-fbdev@vger.kernel.org L: linux-omap@vger.kernel.org -S: Maintained +S: Orphan F: drivers/video/fbdev/omap/ OMAP GENERAL PURPOSE MEMORY CONTROLLER SUPPORT @@ -12179,7 +12177,7 @@ F: drivers/mtd/nand/raw/r852.h RISC-V ARCHITECTURE M: Palmer Dabbelt <palmer@sifive.com> -M: Albert Ou <albert@sifive.com> +M: Albert Ou <aou@eecs.berkeley.edu> L: linux-riscv@lists.infradead.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux.git S: Supported @@ -12939,6 +12937,14 @@ F: drivers/media/usb/siano/ F: drivers/media/usb/siano/ F: drivers/media/mmc/siano/ +SIFIVE DRIVERS +M: Palmer Dabbelt <palmer@sifive.com> +L: linux-riscv@lists.infradead.org +T: git git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux.git +S: Supported +K: sifive +N: sifive + SILEAD TOUCHSCREEN DRIVER M: Hans de Goede <hdegoede@redhat.com> L: linux-input@vger.kernel.org @@ -15007,8 +15013,7 @@ F: drivers/media/usb/zr364xx/ USER-MODE LINUX (UML) M: Jeff Dike <jdike@addtoit.com> M: Richard Weinberger <richard@nod.at> -L: user-mode-linux-devel@lists.sourceforge.net -L: user-mode-linux-user@lists.sourceforge.net +L: linux-um@lists.infradead.org W: http://user-mode-linux.sourceforge.net T: git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git S: Maintained diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index aa9e785c59c2..7841b8a60657 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -134,7 +134,13 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip); void pnv_power9_force_smt4_catch(void); void pnv_power9_force_smt4_release(void); +/* Transaction memory related */ void tm_enable(void); void tm_disable(void); void tm_abort(uint8_t cause); + +struct kvm_vcpu; +void _kvmppc_restore_tm_pr(struct kvm_vcpu *vcpu, u64 guest_msr); +void _kvmppc_save_tm_pr(struct kvm_vcpu *vcpu, u64 guest_msr); + #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */ diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index e7377b73cfec..1f345a0b6ba2 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -104,6 +104,7 @@ struct kvmppc_vcore { ulong vtb; /* virtual timebase */ ulong conferring_threads; unsigned int halt_poll_ns; + atomic_t online_count; }; struct kvmppc_vcpu_book3s { @@ -209,6 +210,7 @@ extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec) extern void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec); extern void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags); +extern void kvmppc_trigger_fac_interrupt(struct kvm_vcpu *vcpu, ulong fac); extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, bool upper, u32 val); extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr); @@ -256,6 +258,21 @@ extern int kvmppc_hcall_impl_pr(unsigned long cmd); extern int kvmppc_hcall_impl_hv_realmode(unsigned long cmd); extern void kvmppc_copy_to_svcpu(struct kvm_vcpu *vcpu); extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu); + +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +void kvmppc_save_tm_pr(struct kvm_vcpu *vcpu); +void kvmppc_restore_tm_pr(struct kvm_vcpu *vcpu); +void kvmppc_save_tm_sprs(struct kvm_vcpu *vcpu); +void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu); +#else +static inline void kvmppc_save_tm_pr(struct kvm_vcpu *vcpu) {} +static inline void kvmppc_restore_tm_pr(struct kvm_vcpu *vcpu) {} +static inline void kvmppc_save_tm_sprs(struct kvm_vcpu *vcpu) {} +static inline void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu) {} +#endif + +void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac); + extern int kvm_irq_bypass; static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu) @@ -274,12 +291,12 @@ static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu) static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val) { - vcpu->arch.gpr[num] = val; + vcpu->arch.regs.gpr[num] = val; } static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) { - return vcpu->arch.gpr[num]; + return vcpu->arch.regs.gpr[num]; } static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val) @@ -294,42 +311,42 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { - vcpu->arch.xer = val; + vcpu->arch.regs.xer = val; } static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { - return vcpu->arch.xer; + return vcpu->arch.regs.xer; } static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val) { - vcpu->arch.ctr = val; + vcpu->arch.regs.ctr = val; } static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu) { - return vcpu->arch.ctr; + return vcpu->arch.regs.ctr; } static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val) { - vcpu->arch.lr = val; + vcpu->arch.regs.link = val; } static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu) { - return vcpu->arch.lr; + return vcpu->arch.regs.link; } static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val) { - vcpu->arch.pc = val; + vcpu->arch.regs.nip = val; } static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu) { - return vcpu->arch.pc; + return vcpu->arch.regs.nip; } static inline u64 kvmppc_get_msr(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index c424e44f4c00..dc435a5af7d6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -483,15 +483,15 @@ static inline u64 sanitize_msr(u64 msr) static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu) { vcpu->arch.cr = vcpu->arch.cr_tm; - vcpu->arch.xer = vcpu->arch.xer_tm; - vcpu->arch.lr = vcpu->arch.lr_tm; - vcpu->arch.ctr = vcpu->arch.ctr_tm; + vcpu->arch.regs.xer = vcpu->arch.xer_tm; + vcpu->arch.regs.link = vcpu->arch.lr_tm; + vcpu->arch.regs.ctr = vcpu->arch.ctr_tm; vcpu->arch.amr = vcpu->arch.amr_tm; vcpu->arch.ppr = vcpu->arch.ppr_tm; vcpu->arch.dscr = vcpu->arch.dscr_tm; vcpu->arch.tar = vcpu->arch.tar_tm; - memcpy(vcpu->arch.gpr, vcpu->arch.gpr_tm, - sizeof(vcpu->arch.gpr)); + memcpy(vcpu->arch.regs.gpr, vcpu->arch.gpr_tm, + sizeof(vcpu->arch.regs.gpr)); vcpu->arch.fp = vcpu->arch.fp_tm; vcpu->arch.vr = vcpu->arch.vr_tm; vcpu->arch.vrsave = vcpu->arch.vrsave_tm; @@ -500,15 +500,15 @@ static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu) static inline void copy_to_checkpoint(struct kvm_vcpu *vcpu) { vcpu->arch.cr_tm = vcpu->arch.cr; - vcpu->arch.xer_tm = vcpu->arch.xer; - vcpu->arch.lr_tm = vcpu->arch.lr; - vcpu->arch.ctr_tm = vcpu->arch.ctr; + vcpu->arch.xer_tm = vcpu->arch.regs.xer; + vcpu->arch.lr_tm = vcpu->arch.regs.link; + vcpu->arch.ctr_tm = vcpu->arch.regs.ctr; vcpu->arch.amr_tm = vcpu->arch.amr; vcpu->arch.ppr_tm = vcpu->arch.ppr; vcpu->arch.dscr_tm = vcpu->arch.dscr; vcpu->arch.tar_tm = vcpu->arch.tar; - memcpy(vcpu->arch.gpr_tm, vcpu->arch.gpr, - sizeof(vcpu->arch.gpr)); + memcpy(vcpu->arch.gpr_tm, vcpu->arch.regs.gpr, + sizeof(vcpu->arch.regs.gpr)); vcpu->arch.fp_tm = vcpu->arch.fp; vcpu->arch.vr_tm = vcpu->arch.vr; vcpu->arch.vrsave_tm = vcpu->arch.vrsave; diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index bc6e29e4dfd4..d513e3ed1c65 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -36,12 +36,12 @@ static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val) { - vcpu->arch.gpr[num] = val; + vcpu->arch.regs.gpr[num] = val; } static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) { - return vcpu->arch.gpr[num]; + return vcpu->arch.regs.gpr[num]; } static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val) @@ -56,12 +56,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { - vcpu->arch.xer = val; + vcpu->arch.regs.xer = val; } static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { - return vcpu->arch.xer; + return vcpu->arch.regs.xer; } static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu) @@ -72,32 +72,32 @@ static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu) static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val) { - vcpu->arch.ctr = val; + vcpu->arch.regs.ctr = val; } static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu) { - return vcpu->arch.ctr; + return vcpu->arch.regs.ctr; } static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val) { - vcpu->arch.lr = val; + vcpu->arch.regs.link = val; } static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu) { - return vcpu->arch.lr; + return vcpu->arch.regs.link; } static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val) { - vcpu->arch.pc = val; + vcpu->arch.regs.nip = val; } static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu) { - return vcpu->arch.pc; + return vcpu->arch.regs.nip; } static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 17498e9a26e4..fa4efa7e88f7 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -269,7 +269,6 @@ struct kvm_arch { unsigned long host_lpcr; unsigned long sdr1; unsigned long host_sdr1; - int tlbie_lock; unsigned long lpcr; unsigned long vrma_slb_v; int mmu_ready; @@ -454,6 +453,12 @@ struct mmio_hpte_cache { #define KVMPPC_VSX_COPY_WORD 1 #define KVMPPC_VSX_COPY_DWORD 2 #define KVMPPC_VSX_COPY_DWORD_LOAD_DUMP 3 +#define KVMPPC_VSX_COPY_WORD_LOAD_DUMP 4 + +#define KVMPPC_VMX_COPY_BYTE 8 +#define KVMPPC_VMX_COPY_HWORD 9 +#define KVMPPC_VMX_COPY_WORD 10 +#define KVMPPC_VMX_COPY_DWORD 11 struct openpic; @@ -486,7 +491,7 @@ struct kvm_vcpu_arch { struct kvmppc_book3s_shadow_vcpu *shadow_vcpu; #endif - ulong gpr[32]; + struct pt_regs regs; struct thread_fp_state fp; @@ -521,14 +526,10 @@ struct kvm_vcpu_arch { u32 qpr[32]; #endif - ulong pc; - ulong ctr; - ulong lr; #ifdef CONFIG_PPC_BOOK3S ulong tar; #endif - ulong xer; u32 cr; #ifdef CONFIG_PPC_BOOK3S @@ -626,7 +627,6 @@ struct kvm_vcpu_arch { struct thread_vr_state vr_tm; u32 vrsave_tm; /* also USPRG0 */ - #endif #ifdef CONFIG_KVM_EXIT_TIMING @@ -681,16 +681,17 @@ struct kvm_vcpu_arch { * Number of simulations for vsx. * If we use 2*8bytes to simulate 1*16bytes, * then the number should be 2 and - * mmio_vsx_copy_type=KVMPPC_VSX_COPY_DWORD. + * mmio_copy_type=KVMPPC_VSX_COPY_DWORD. * If we use 4*4bytes to simulate 1*16bytes, * the number should be 4 and * mmio_vsx_copy_type=KVMPPC_VSX_COPY_WORD. */ u8 mmio_vsx_copy_nums; u8 mmio_vsx_offset; - u8 mmio_vsx_copy_type; u8 mmio_vsx_tx_sx_enabled; u8 mmio_vmx_copy_nums; + u8 mmio_vmx_offset; + u8 mmio_copy_type; u8 osi_needed; u8 osi_enabled; u8 papr_enabled; @@ -772,6 +773,8 @@ struct kvm_vcpu_arch { u64 busy_preempt; u32 emul_inst; + + u32 online; #endif #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index abe7032cdb54..e991821dd7fa 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -52,7 +52,7 @@ enum emulation_result { EMULATE_EXIT_USER, /* emulation requires exit to user-space */ }; -enum instruction_type { +enum instruction_fetch_type { INST_GENERIC, INST_SC, /* system call */ }; @@ -81,10 +81,10 @@ extern int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu, extern int kvmppc_handle_vsx_load(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int rt, unsigned int bytes, int is_default_endian, int mmio_sign_extend); -extern int kvmppc_handle_load128_by2x64(struct kvm_run *run, - struct kvm_vcpu *vcpu, unsigned int rt, int is_default_endian); -extern int kvmppc_handle_store128_by2x64(struct kvm_run *run, - struct kvm_vcpu *vcpu, unsigned int rs, int is_default_endian); +extern int kvmppc_handle_vmx_load(struct kvm_run *run, struct kvm_vcpu *vcpu, + unsigned int rt, unsigned int bytes, int is_default_endian); +extern int kvmppc_handle_vmx_store(struct kvm_run *run, struct kvm_vcpu *vcpu, + unsigned int rs, unsigned int bytes, int is_default_endian); extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, u64 val, unsigned int bytes, int is_default_endian); @@ -93,7 +93,7 @@ extern int kvmppc_handle_vsx_store(struct kvm_run *run, struct kvm_vcpu *vcpu, int is_default_endian); extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, - enum instruction_type type, u32 *inst); + enum instruction_fetch_type type, u32 *inst); extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); @@ -265,6 +265,8 @@ union kvmppc_one_reg { vector128 vval; u64 vsxval[2]; u32 vsx32val[4]; + u16 vsx16val[8]; + u8 vsx8val[16]; struct { u64 addr; u64 length; @@ -324,13 +326,14 @@ struct kvmppc_ops { int (*get_rmmu_info)(struct kvm *kvm, struct kvm_ppc_rmmu_info *info); int (*set_smt_mode)(struct kvm *kvm, unsigned long mode, unsigned long flags); + void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr); }; extern struct kvmppc_ops *kvmppc_hv_ops; extern struct kvmppc_ops *kvmppc_pr_ops; static inline int kvmppc_get_last_inst(struct kvm_vcpu *vcpu, - enum instruction_type type, u32 *inst) + enum instruction_fetch_type type, u32 *inst) { int ret = EMULATE_DONE; u32 fetched_inst; diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 75c5b2cd9d66..562568414cf4 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -385,6 +385,7 @@ #define SPRN_PSSCR 0x357 /* Processor Stop Status and Control Register (ISA 3.0) */ #define SPRN_PSSCR_PR 0x337 /* PSSCR ISA 3.0, privileged mode access */ #define SPRN_PMCR 0x374 /* Power Management Control Register */ +#define SPRN_RWMR 0x375 /* Region-Weighting Mode Register */ /* HFSCR and FSCR bit numbers are the same */ #define FSCR_SCV_LG 12 /* Enable System Call Vectored */ diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 833ed9a16adf..1b32b56a03d3 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -633,6 +633,7 @@ struct kvm_ppc_cpu_char { #define KVM_REG_PPC_PSSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbd) #define KVM_REG_PPC_DEC_EXPIRY (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbe) +#define KVM_REG_PPC_ONLINE (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf) /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9fc9e0977009..0a0544335950 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -426,20 +426,20 @@ int main(void) OFFSET(VCPU_HOST_STACK, kvm_vcpu, arch.host_stack); OFFSET(VCPU_HOST_PID, kvm_vcpu, arch.host_pid); OFFSET(VCPU_GUEST_PID, kvm_vcpu, arch.pid); - OFFSET(VCPU_GPRS, kvm_vcpu, arch.gpr); + OFFSET(VCPU_GPRS, kvm_vcpu, arch.regs.gpr); OFFSET(VCPU_VRSAVE, kvm_vcpu, arch.vrsave); OFFSET(VCPU_FPRS, kvm_vcpu, arch.fp.fpr); #ifdef CONFIG_ALTIVEC OFFSET(VCPU_VRS, kvm_vcpu, arch.vr.vr); #endif - OFFSET(VCPU_XER, kvm_vcpu, arch.xer); - OFFSET(VCPU_CTR, kvm_vcpu, arch.ctr); - OFFSET(VCPU_LR, kvm_vcpu, arch.lr); + OFFSET(VCPU_XER, kvm_vcpu, arch.regs.xer); + OFFSET(VCPU_CTR, kvm_vcpu, arch.regs.ctr); + OFFSET(VCPU_LR, kvm_vcpu, arch.regs.link); #ifdef CONFIG_PPC_BOOK3S OFFSET(VCPU_TAR, kvm_vcpu, arch.tar); #endif OFFSET(VCPU_CR, kvm_vcpu, arch.cr); - OFFSET(VCPU_PC, kvm_vcpu, arch.pc); + OFFSET(VCPU_PC, kvm_vcpu, arch.regs.nip); #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE OFFSET(VCPU_MSR, kvm_vcpu, arch.shregs.msr); OFFSET(VCPU_SRR0, kvm_vcpu, arch.shregs.srr0); @@ -696,10 +696,10 @@ int main(void) #else /* CONFIG_PPC_BOOK3S */ OFFSET(VCPU_CR, kvm_vcpu, arch.cr); - OFFSET(VCPU_XER, kvm_vcpu, arch.xer); - OFFSET(VCPU_LR, kvm_vcpu, arch.lr); - OFFSET(VCPU_CTR, kvm_vcpu, arch.ctr); - OFFSET(VCPU_PC, kvm_vcpu, arch.pc); + OFFSET(VCPU_XER, kvm_vcpu, arch.regs.xer); + OFFSET(VCPU_LR, kvm_vcpu, arch.regs.link); + OFFSET(VCPU_CTR, kvm_vcpu, arch.regs.ctr); + OFFSET(VCPU_PC, kvm_vcpu, arch.regs.nip); OFFSET(VCPU_SPRG9, kvm_vcpu, arch.sprg9); OFFSET(VCPU_LAST_INST, kvm_vcpu, arch.last_inst); OFFSET(VCPU_FAULT_DEAR, kvm_vcpu, arch.fault_dear); diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 4b19da8c87ae..f872c04bb5b1 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -63,6 +63,9 @@ kvm-pr-y := \ book3s_64_mmu.o \ book3s_32_mmu.o +kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \ + tm.o + ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \ book3s_rmhandlers.o diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 97d4a112648f..edaf4720d156 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -134,7 +134,7 @@ void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags) { kvmppc_unfixup_split_real(vcpu); kvmppc_set_srr0(vcpu, kvmppc_get_pc(vcpu)); - kvmppc_set_srr1(vcpu, kvmppc_get_msr(vcpu) | flags); + kvmppc_set_srr1(vcpu, (kvmppc_get_msr(vcpu) & ~0x783f0000ul) | flags); kvmppc_set_pc(vcpu, kvmppc_interrupt_offset(vcpu) + vec); vcpu->arch.mmu.reset_msr(vcpu); } @@ -256,18 +256,15 @@ void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar, { kvmppc_set_dar(vcpu, dar); kvmppc_set_dsisr(vcpu, flags); - kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE); + kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, 0); } -EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage); /* used by kvm_hv */ +EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage); void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong flags) { - u64 msr = kvmppc_get_msr(vcpu); - msr &= ~(SRR1_ISI_NOPT | SRR1_ISI_N_OR_G | SRR1_ISI_PROT); - msr |= flags & (SRR1_ISI_NOPT | SRR1_ISI_N_OR_G | SRR1_ISI_PROT); - kvmppc_set_msr_fast(vcpu, msr); - kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE); + kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_INST_STORAGE, flags); } +EXPORT_SYMBOL_GPL(kvmppc_core_queue_inst_storage); static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) @@ -450,8 +447,8 @@ int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid, return r; } -int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, enum instruction_type type, - u32 *inst) +int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, + enum instruction_fetch_type type, u32 *inst) { ulong pc = kvmppc_get_pc(vcpu); int r; @@ -509,8 +506,6 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) { int i; - vcpu_load(vcpu); - regs->pc = kvmppc_get_pc(vcpu); regs->cr = kvmppc_get_cr(vcpu); regs->ctr = kvmppc_get_ctr(vcpu); @@ -532,7 +527,6 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) for (i = 0; i < ARRAY_SIZE(regs->gpr); i++) regs->gpr[i] = kvmppc_get_gpr(vcpu, i); - vcpu_put(vcpu); return 0; } @@ -540,8 +534,6 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) { int i; - vcpu_load(vcpu); - kvmppc_set_pc(vcpu, regs->pc); kvmppc_set_cr(vcpu, regs->cr); kvmppc_set_ctr(vcpu, regs->ctr); @@ -562,7 +554,6 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) for (i = 0; i < ARRAY_SIZE(regs->gpr); i++) kvmppc_set_gpr(vcpu, i, regs->gpr[i]); - vcpu_put(vcpu); return 0; } diff --git a/arch/powerpc/kvm/book3s.h b/arch/powerpc/kvm/book3s.h index 4ad5e287b8bc..14ef03501d21 100644 --- a/arch/powerpc/kvm/book3s.h +++ b/arch/powerpc/kvm/book3s.h @@ -31,4 +31,10 @@ extern int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, extern int kvmppc_book3s_init_pr(void); extern void kvmppc_book3s_exit_pr(void); +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +extern void kvmppc_emulate_tabort(struct kvm_vcpu *vcpu, int ra_val); +#else +static inline void kvmppc_emulate_tabort(struct kvm_vcpu *vcpu, int ra_val) {} +#endif + #endif diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 1992676c7a94..45c8ea4a0487 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -52,7 +52,7 @@ static inline bool check_debug_ip(struct kvm_vcpu *vcpu) { #ifdef DEBUG_MMU_PTE_IP - return vcpu->arch.pc == DEBUG_MMU_PTE_IP; + return vcpu->arch.regs.nip == DEBUG_MMU_PTE_IP; #else return true; #endif diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index a93d719edc90..cf9d686e8162 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -38,7 +38,16 @@ static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu) { - kvmppc_set_msr(vcpu, vcpu->arch.intr_msr); + unsigned long msr = vcpu->arch.intr_msr; + unsigned long cur_msr = kvmppc_get_msr(vcpu); + + /* If transactional, change to suspend mode on IRQ delivery */ + if (MSR_TM_TRANSACTIONAL(cur_msr)) + msr |= MSR_TS_S; + else + msr |= cur_msr & MSR_TS_MASK; + + kvmppc_set_msr(vcpu, msr); } static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe( diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 1b3fcafc685e..7f3a8cf5d66f 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -272,6 +272,9 @@ int kvmppc_mmu_hv_init(void) if (!cpu_has_feature(CPU_FTR_HVMODE)) return -EINVAL; + if (!mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE)) + return -EINVAL; + /* POWER7 has 10-bit LPIDs (12-bit in POWER8) */ host_lpid = mfspr(SPRN_LPID); rsvd_lpid = LPID_RSVD; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 481da8f93fa4..176f911ee983 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -139,44 +139,24 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, return 0; } -#ifdef CONFIG_PPC_64K_PAGES -#define MMU_BASE_PSIZE MMU_PAGE_64K -#else -#define MMU_BASE_PSIZE MMU_PAGE_4K -#endif - static void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr, unsigned int pshift) { - int psize = MMU_BASE_PSIZE; - - if (pshift >= PUD_SHIFT) - psize = MMU_PAGE_1G; - else if (pshift >= PMD_SHIFT) - psize = MMU_PAGE_2M; - addr &= ~0xfffUL; - addr |= mmu_psize_defs[psize].ap << 5; - asm volatile("ptesync": : :"memory"); - asm volatile(PPC_TLBIE_5(%0, %1, 0, 0, 1) - : : "r" (addr), "r" (kvm->arch.lpid) : "memory"); - if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) - asm volatile(PPC_TLBIE_5(%0, %1, 0, 0, 1) - : : "r" (addr), "r" (kvm->arch.lpid) : "memory"); - asm volatile("eieio ; tlbsync ; ptesync": : :"memory"); + unsigned long psize = PAGE_SIZE; + + if (pshift) + psize = 1UL << pshift; + + addr &= ~(psize - 1); + radix__flush_tlb_lpid_page(kvm->arch.lpid, addr, psize); } -static void kvmppc_radix_flush_pwc(struct kvm *kvm, unsigned long addr) +static void kvmppc_radix_flush_pwc(struct kvm *kvm) { - unsigned long rb = 0x2 << PPC_BITLSHIFT(53); /* IS = 2 */ - - asm volatile("ptesync": : :"memory"); - /* RIC=1 PRS=0 R=1 IS=2 */ - asm volatile(PPC_TLBIE_5(%0, %1, 1, 0, 1) - : : "r" (rb), "r" (kvm->arch.lpid) : "memory"); - asm volatile("eieio ; tlbsync ; ptesync": : :"memory"); + radix__flush_pwc_lpid(kvm->arch.lpid); } -unsigned long kvmppc_radix_update_pte(struct kvm *kvm, pte_t *ptep, +static unsigned long kvmppc_radix_update_pte(struct kvm *kvm, pte_t *ptep, unsigned long clr, unsigned long set, unsigned long addr, unsigned int shift) { @@ -228,6 +208,167 @@ static void kvmppc_pmd_free(pmd_t *pmdp) kmem_cache_free(kvm_pmd_cache, pmdp); } +static void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, + unsigned long gpa, unsigned int shift) + +{ + unsigned long page_size = 1ul << shift; + unsigned long old; + + old = kvmppc_radix_update_pte(kvm, pte, ~0UL, 0, gpa, shift); + kvmppc_radix_tlbie_page(kvm, gpa, shift); + if (old & _PAGE_DIRTY) { + unsigned long gfn = gpa >> PAGE_SHIFT; + struct kvm_memory_slot *memslot; + + memslot = gfn_to_memslot(kvm, gfn); + if (memslot && memslot->dirty_bitmap) + kvmppc_update_dirty_map(memslot, gfn, page_size); + } +} + +/* + * kvmppc_free_p?d are used to free existing page tables, and recursively + * descend and clear and free children. + * Callers are responsible for flushing the PWC. + * + * When page tables are being unmapped/freed as part of page fault path + * (full == false), ptes are not expected. There is code to unmap them + * and emit a warning if encountered, but there may already be data + * corruption due to the unexpected mappings. + */ +static void kvmppc_unmap_free_pte(struct kvm *kvm, pte_t *pte, bool full) +{ + if (full) { + memset(pte, 0, sizeof(long) << PTE_INDEX_SIZE); + } else { + pte_t *p = pte; + unsigned long it; + + for (it = 0; it < PTRS_PER_PTE; ++it, ++p) { + if (pte_val(*p) == 0) + continue; + WARN_ON_ONCE(1); + kvmppc_unmap_pte(kvm, p, + pte_pfn(*p) << PAGE_SHIFT, + PAGE_SHIFT); + } + } + + kvmppc_pte_free(pte); +} + +static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t *pmd, bool full) +{ + unsigned long im; + pmd_t *p = pmd; + + for (im = 0; im < PTRS_PER_PMD; ++im, ++p) { + if (!pmd_present(*p)) + continue; + if (pmd_is_leaf(*p)) { + if (full) { + pmd_clear(p); + } else { + WARN_ON_ONCE(1); + kvmppc_unmap_pte(kvm, (pte_t *)p, + pte_pfn(*(pte_t *)p) << PAGE_SHIFT, + PMD_SHIFT); + } + } else { + pte_t *pte; + + pte = pte_offset_map(p, 0); + kvmppc_unmap_free_pte(kvm, pte, full); + pmd_clear(p); + } + } + kvmppc_pmd_free(pmd); +} + +static void kvmppc_unmap_free_pud(struct kvm *kvm, pud_t *pud) +{ + unsigned long iu; + pud_t *p = pud; + + for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++p) { + if (!pud_present(*p)) + continue; + if (pud_huge(*p)) { + pud_clear(p); + } else { + pmd_t *pmd; + + pmd = pmd_offset(p, 0); + kvmppc_unmap_free_pmd(kvm, pmd, true); + pud_clear(p); + } + } + pud_free(kvm->mm, pud); +} + +void kvmppc_free_radix(struct kvm *kvm) +{ + unsigned long ig; + pgd_t *pgd; + + if (!kvm->arch.pgtable) + return; + pgd = kvm->arch.pgtable; + for (ig = 0; ig < PTRS_PER_PGD; ++ig, ++pgd) { + pud_t *pud; + + if (!pgd_present(*pgd)) + continue; + pud = pud_offset(pgd, 0); + kvmppc_unmap_free_pud(kvm, pud); + pgd_clear(pgd); + } + pgd_free(kvm->mm, kvm->arch.pgtable); + kvm->arch.pgtable = NULL; +} + +static void kvmppc_unmap_free_pmd_entry_table(struct kvm *kvm, pmd_t *pmd, + unsigned long gpa) +{ + pte_t *pte = pte_offset_kernel(pmd, 0); + + /* + * Clearing the pmd entry then flushing the PWC ensures that the pte + * page no longer be cached by the MMU, so can be freed without + * flushing the PWC again. + */ + pmd_clear(pmd); + kvmppc_radix_flush_pwc(kvm); + + kvmppc_unmap_free_pte(kvm, pte, false); +} + +static void kvmppc_unmap_free_pud_entry_table(struct kvm *kvm, pud_t *pud, + unsigned long gpa) +{ + pmd_t *pmd = pmd_offset(pud, 0); + + /* + * Clearing the pud entry then flushing the PWC ensures that the pmd + * page and any children pte pages will no longer be cached by the MMU, + * so can be freed without flushing the PWC again. + */ + pud_clear(pud); + kvmppc_radix_flush_pwc(kvm); + + kvmppc_unmap_free_pmd(kvm, pmd, false); +} + +/* + * There are a number of bits which may differ between different faults to + * the same partition scope entry. RC bits, in the course of cleaning and + * aging. And the write bit can change, either the access could have been + * upgraded, or a read fault could happen concurrently with a write fault + * that sets those bits first. + */ +#define PTE_BITS_MUST_MATCH (~(_PAGE_WRITE | _PAGE_DIRTY | _PAGE_ACCESSED)) + static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, unsigned int level, unsigned long mmu_seq) { @@ -235,7 +376,6 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, pud_t *pud, *new_pud = NULL; pmd_t *pmd, *new_pmd = NULL; pte_t *ptep, *new_ptep = NULL; - unsigned long old; int ret; /* Traverse the guest's 2nd-level tree, allocate new levels needed */ @@ -273,42 +413,39 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, if (pud_huge(*pud)) { unsigned long hgpa = gpa & PUD_MASK; + /* Check if we raced and someone else has set the same thing */ + if (level == 2) { + if (pud_raw(*pud) == pte_raw(pte)) { + ret = 0; + goto out_unlock; + } + /* Valid 1GB page here already, add our extra bits */ + WARN_ON_ONCE((pud_val(*pud) ^ pte_val(pte)) & + PTE_BITS_MUST_MATCH); + kvmppc_radix_update_pte(kvm, (pte_t *)pud, + 0, pte_val(pte), hgpa, PUD_SHIFT); + ret = 0; + goto out_unlock; + } /* * If we raced with another CPU which has just put * a 1GB pte in after we saw a pmd page, try again. */ - if (level <= 1 && !new_pmd) { + if (!new_pmd) { ret = -EAGAIN; goto out_unlock; } - /* Check if we raced and someone else has set the same thing */ - if (level == 2 && pud_raw(*pud) == pte_raw(pte)) { - ret = 0; - goto out_unlock; - } /* Valid 1GB page here already, remove it */ - old = kvmppc_radix_update_pte(kvm, (pte_t *)pud, - ~0UL, 0, hgpa, PUD_SHIFT); - kvmppc_radix_tlbie_page(kvm, hgpa, PUD_SHIFT); - if (old & _PAGE_DIRTY) { - unsigned long gfn = hgpa >> PAGE_SHIFT; - struct kvm_memory_slot *memslot; - memslot = gfn_to_memslot(kvm, gfn); - if (memslot && memslot->dirty_bitmap) - kvmppc_update_dirty_map(memslot, - gfn, PUD_SIZE); - } + kvmppc_unmap_pte(kvm, (pte_t *)pud, hgpa, PUD_SHIFT); } if (level == 2) { if (!pud_none(*pud)) { /* * There's a page table page here, but we wanted to * install a large page, so remove and free the page - * table page. new_pmd will be NULL since level == 2. + * table page. */ - new_pmd = pmd_offset(pud, 0); - pud_clear(pud); - kvmppc_radix_flush_pwc(kvm, gpa); + kvmppc_unmap_free_pud_entry_table(kvm, pud, gpa); } kvmppc_radix_set_pte_at(kvm, gpa, (pte_t *)pud, pte); ret = 0; @@ -324,42 +461,40 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, if (pmd_is_leaf(*pmd)) { unsigned long lgpa = gpa & PMD_MASK; + /* Check if we raced and someone else has set the same thing */ + if (level == 1) { + if (pmd_raw(*pmd) == pte_raw(pte)) { + ret = 0; + goto out_unlock; + } + /* Valid 2MB page here already, add our extra bits */ + WARN_ON_ONCE((pmd_val(*pmd) ^ pte_val(pte)) & + PTE_BITS_MUST_MATCH); + kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd), + 0, pte_val(pte), lgpa, PMD_SHIFT); + ret = 0; + goto out_unlock; + } + /* * If we raced with another CPU which has just put * a 2MB pte in after we saw a pte page, try again. */ - if (level == 0 && !new_ptep) { + if (!new_ptep) { ret = -EAGAIN; goto out_unlock; } - /* Check if we raced and someone else has set the same thing */ - if (level == 1 && pmd_raw(*pmd) == pte_raw(pte)) { - ret = 0; - goto out_unlock; - } /* Valid 2MB page here already, remove it */ - old = kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd), - ~0UL, 0, lgpa, PMD_SHIFT); - kvmppc_radix_tlbie_page(kvm, lgpa, PMD_SHIFT); - if (old & _PAGE_DIRTY) { - unsigned long gfn = lgpa >> PAGE_SHIFT; - struct kvm_memory_slot *memslot; - memslot = gfn_to_memslot(kvm, gfn); - if (memslot && memslot->dirty_bitmap) - kvmppc_update_dirty_map(memslot, - gfn, PMD_SIZE); - } + kvmppc_unmap_pte(kvm, pmdp_ptep(pmd), lgpa, PMD_SHIFT); } if (level == 1) { if (!pmd_none(*pmd)) { /* * There's a page table page here, but we wanted to * install a large page, so remove and free the page - * table page. new_ptep will be NULL since level == 1. + * table page. */ - new_ptep = pte_offset_kernel(pmd, 0); - pmd_clear(pmd); - kvmppc_radix_flush_pwc(kvm, gpa); + kvmppc_unmap_free_pmd_entry_table(kvm, pmd, gpa); } kvmppc_radix_set_pte_at(kvm, gpa, pmdp_ptep(pmd), pte); ret = 0; @@ -378,12 +513,12 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, ret = 0; goto out_unlock; } - /* PTE was previously valid, so invalidate it */ - old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT, - 0, gpa, 0); - kvmppc_radix_tlbie_page(kvm, gpa, 0); - if (old & _PAGE_DIRTY) - mark_page_dirty(kvm, gpa >> PAGE_SHIFT); + /* Valid page here already, add our extra bits */ + WARN_ON_ONCE((pte_val(*ptep) ^ pte_val(pte)) & + PTE_BITS_MUST_MATCH); + kvmppc_radix_update_pte(kvm, ptep, 0, pte_val(pte), gpa, 0); + ret = 0; + goto out_unlock; } kvmppc_radix_set_pte_at(kvm, gpa, ptep, pte); ret = 0; @@ -565,9 +700,13 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long mask = (1ul << shift) - PAGE_SIZE; pte = __pte(pte_val(pte) | (hva & mask)); } - if (!(writing || upgrade_write)) - pte = __pte(pte_val(pte) & ~ _PAGE_WRITE); - pte = __pte(pte_val(pte) | _PAGE_EXEC); + pte = __pte(pte_val(pte) | _PAGE_EXEC | _PAGE_ACCESSED); + if (writing || upgrade_write) { + if (pte_val(pte) & _PAGE_WRITE) + pte = __pte(pte_val(pte) | _PAGE_DIRTY); + } else { + pte = __pte(pte_val(pte) & ~(_PAGE_WRITE | _PAGE_DIRTY)); + } } /* Allocate space in the tree and write the PTE */ @@ -734,51 +873,6 @@ int kvmppc_init_vm_radix(struct kvm *kvm) return 0; } -void kvmppc_free_radix(struct kvm *kvm) -{ - unsigned long ig, iu, im; - pte_t *pte; - pmd_t *pmd; - pud_t *pud; - pgd_t *pgd; - - if (!kvm->arch.pgtable) - return; - pgd = kvm->arch.pgtable; - for (ig = 0; ig < PTRS_PER_PGD; ++ig, ++pgd) { - if (!pgd_present(*pgd)) - continue; - pud = pud_offset(pgd, 0); - for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++pud) { - if (!pud_present(*pud)) - continue; - if (pud_huge(*pud)) { - pud_clear(pud); - continue; - } - pmd = pmd_offset(pud, 0); - for (im = 0; im < PTRS_PER_PMD; ++im, ++pmd) { - if (pmd_is_leaf(*pmd)) { - pmd_clear(pmd); - continue; - } - if (!pmd_present(*pmd)) - continue; - pte = pte_offset_map(pmd, 0); - memset(pte, 0, sizeof(long) << PTE_INDEX_SIZE); - kvmppc_pte_free(pte); - pmd_clear(pmd); - } - kvmppc_pmd_free(pmd_offset(pud, 0)); - pud_clear(pud); - } - pud_free(kvm->mm, pud_offset(pgd, 0)); - pgd_clear(pgd); - } - pgd_free(kvm->mm, kvm->arch.pgtable); - kvm->arch.pgtable = NULL; -} - static void pte_ctor(void *addr) { memset(addr, 0, RADIX_PTE_TABLE_SIZE); diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 4dffa611376d..d066e37551ec 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd, if (!tbltmp) continue; - /* - * Make sure hardware table parameters are exactly the same; - * this is used in the TCE handlers where boundary checks - * use only the first attached table. - */ - if ((tbltmp->it_page_shift == stt->page_shift) && - (tbltmp->it_offset == stt->offset) && - (tbltmp->it_size == stt->size)) { + /* Make sure hardware table parameters are compatible */ + if ((tbltmp->it_page_shift <= stt->page_shift) && + (tbltmp->it_offset << tbltmp->it_page_shift == + stt->offset << stt->page_shift) && + (tbltmp->it_size << tbltmp->it_page_shift == + stt->size << stt->page_shift)) { /* * Reference the table to avoid races with * add/remove DMA windows. @@ -237,7 +235,7 @@ static void release_spapr_tce_table(struct rcu_head *head) kfree(stt); } -static int kvm_spapr_tce_fault(struct vm_fault *vmf) +static vm_fault_t kvm_spapr_tce_fault(struct vm_fault *vmf) { struct kvmppc_spapr_tce_table *stt = vmf->vma->vm_file->private_data; struct page *page; @@ -302,7 +300,8 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, int ret = -ENOMEM; int i; - if (!args->size) + if (!args->size || args->page_shift < 12 || args->page_shift > 34 || + (args->offset + args->size > (ULLONG_MAX >> args->page_shift))) return -EINVAL; size = _ALIGN_UP(args->size, PAGE_SIZE >> 3); @@ -396,7 +395,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm, return H_SUCCESS; } -static long kvmppc_tce_iommu_unmap(struct kvm *kvm, +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm, struct iommu_table *tbl, unsigned long entry) { enum dma_data_direction dir = DMA_NONE; @@ -416,7 +415,24 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm, return ret; } -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl, +static long kvmppc_tce_iommu_unmap(struct kvm *kvm, + struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl, + unsigned long entry) +{ + unsigned long i, ret = H_SUCCESS; + unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift); + unsigned long io_entry = entry * subpages; + + for (i = 0; i < subpages; ++i) { + ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + i); + if (ret != H_SUCCESS) + break; + } + + return ret; +} + +long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, unsigned long entry, unsigned long ua, enum dma_data_direction dir) { @@ -453,6 +469,27 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl, return 0; } +static long kvmppc_tce_iommu_map(struct kvm *kvm, + struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl, + unsigned long entry, unsigned long ua, + enum dma_data_direction dir) +{ + unsigned long i, pgoff, ret = H_SUCCESS; + unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift); + unsigned long io_entry = entry * subpages; + + for (i = 0, pgoff = 0; i < subpages; + ++i, pgoff += IOMMU_PAGE_SIZE(tbl)) { + + ret = kvmppc_tce_iommu_do_map(kvm, tbl, + io_entry + i, ua + pgoff, dir); + if (ret != H_SUCCESS) + break; + } + + return ret; +} + long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce) { @@ -491,10 +528,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { if (dir == DMA_NONE) - ret = kvmppc_tce_iommu_unmap(vcpu->kvm, + ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt, stit->tbl, entry); else - ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl, + ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl, entry, ua, dir); if (ret == H_SUCCESS) @@ -570,7 +607,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, return H_PARAMETER; list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { - ret = kvmppc_tce_iommu_map(vcpu->kvm, + ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl, entry + i, ua, iommu_tce_direction(tce)); @@ -615,10 +652,10 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, return H_PARAMETER; list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { - unsigned long entry = ioba >> stit->tbl->it_page_shift; + unsigned long entry = ioba >> stt->page_shift; for (i = 0; i < npages; ++i) { - ret = kvmppc_tce_iommu_unmap(vcpu->kvm, + ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt, stit->tbl, entry + i); if (ret == H_SUCCESS) diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 6651f736a0b1..925fc316a104 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm, return H_SUCCESS; } -static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm, +static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm, struct iommu_table *tbl, unsigned long entry) { enum dma_data_direction dir = DMA_NONE; @@ -245,7 +245,24 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm, return ret; } -static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl, +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm, + struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl, + unsigned long entry) +{ + unsigned long i, ret = H_SUCCESS; + unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift); + unsigned long io_entry = entry * subpages; + + for (i = 0; i < subpages; ++i) { + ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + i); + if (ret != H_SUCCESS) + break; + } + + return ret; +} + +static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, unsigned long entry, unsigned long ua, enum dma_data_direction dir) { @@ -290,6 +307,27 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl, return 0; } +static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, + struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl, + unsigned long entry, unsigned long ua, + enum dma_data_direction dir) +{ + unsigned long i, pgoff, ret = H_SUCCESS; + unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift); + unsigned long io_entry = entry * subpages; + + for (i = 0, pgoff = 0; i < subpages; + ++i, pgoff += IOMMU_PAGE_SIZE(tbl)) { + + ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl, + io_entry + i, ua + pgoff, dir); + if (ret != H_SUCCESS) + break; + } + + return ret; +} + long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce) { @@ -327,10 +365,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { if (dir == DMA_NONE) - ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, + ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt, stit->tbl, entry); else - ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, + ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt, stit->tbl, entry, ua, dir); if (ret == H_SUCCESS) @@ -477,7 +515,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, return H_PARAMETER; list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { - ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, + ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt, stit->tbl, entry + i, ua, iommu_tce_direction(tce)); @@ -526,10 +564,10 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, return H_PARAMETER; list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { - unsigned long entry = ioba >> stit->tbl->it_page_shift; + unsigned long entry = ioba >> stt->page_shift; for (i = 0; i < npages; ++i) { - ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, + ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt, stit->tbl, entry + i); if (ret == H_SUCCESS) @@ -571,7 +609,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, page = stt->pages[idx / TCES_PER_PAGE]; tbl = (u64 *)page_address(page); - vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE]; + vcpu->arch.regs.gpr[4] = tbl[idx % TCES_PER_PAGE]; return H_SUCCESS; } diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 68d68983948e..36b11c5a0dbb 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -23,7 +23,9 @@ #include <asm/reg.h> #include <asm/switch_to.h> #include <asm/time.h> +#include <asm/tm.h> #include "book3s.h" +#include <asm/asm-prototypes.h> #define OP_19_XOP_RFID 18 #define OP_19_XOP_RFI 50 @@ -47,6 +49,12 @@ #define OP_31_XOP_EIOIO 854 #define OP_31_XOP_SLBMFEE 915 +#define OP_31_XOP_TBEGIN 654 +#define OP_31_XOP_TABORT 910 + +#define OP_31_XOP_TRECLAIM 942 +#define OP_31_XOP_TRCHKPT 1006 + /* DCBZ is actually 1014, but we patch it to 1010 so we get a trap */ #define OP_31_XOP_DCBZ 1010 @@ -87,6 +95,157 @@ static bool spr_allowed(struct kvm_vcpu *vcpu, enum priv_level level) return true; } +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +static inline void kvmppc_copyto_vcpu_tm(struct kvm_vcpu *vcpu) +{ + memcpy(&vcpu->arch.gpr_tm[0], &vcpu->arch.regs.gpr[0], + sizeof(vcpu->arch.gpr_tm)); + memcpy(&vcpu->arch.fp_tm, &vcpu->arch.fp, + sizeof(struct thread_fp_state)); + memcpy(&vcpu->arch.vr_tm, &vcpu->arch.vr, + sizeof(struct thread_vr_state)); + vcpu->arch.ppr_tm = vcpu->arch.ppr; + vcpu->arch.dscr_tm = vcpu->arch.dscr; + vcpu->arch.amr_tm = vcpu->arch.amr; + vcpu->arch.ctr_tm = vcpu->arch.regs.ctr; + vcpu->arch.tar_tm = vcpu->arch.tar; + vcpu->arch.lr_tm = vcpu->arch.regs.link; + vcpu->arch.cr_tm = vcpu->arch.cr; + vcpu->arch.xer_tm = vcpu->arch.regs.xer; + vcpu->arch.vrsave_tm = vcpu->arch.vrsave; +} + +static inline void kvmppc_copyfrom_vcpu_tm(struct kvm_vcpu *vcpu) +{ + memcpy(&vcpu->arch.regs.gpr[0], &vcpu->arch.gpr_tm[0], + sizeof(vcpu->arch.regs.gpr)); + memcpy(&vcpu->arch.fp, &vcpu->arch.fp_tm, + sizeof(struct thread_fp_state)); + memcpy(&vcpu->arch.vr, &vcpu->arch.vr_tm, + sizeof(struct thread_vr_state)); + vcpu->arch.ppr = vcpu->arch.ppr_tm; + vcpu->arch.dscr = vcpu->arch.dscr_tm; + vcpu->arch.amr = vcpu->arch.amr_tm; + vcpu->arch.regs.ctr = vcpu->arch.ctr_tm; + vcpu->arch.tar = vcpu->arch.tar_tm; + vcpu->arch.regs.link = vcpu->arch.lr_tm; + vcpu->arch.cr = vcpu->arch.cr_tm; + vcpu->arch.regs.xer = vcpu->arch.xer_tm; + vcpu->arch.vrsave = vcpu->arch.vrsave_tm; +} + +static void kvmppc_emulate_treclaim(struct kvm_vcpu *vcpu, int ra_val) +{ + unsigned long guest_msr = kvmppc_get_msr(vcpu); + int fc_val = ra_val ? ra_val : 1; + uint64_t texasr; + + /* CR0 = 0 | MSR[TS] | 0 */ + vcpu->arch.cr = (vcpu->arch.cr & ~(CR0_MASK << CR0_SHIFT)) | + (((guest_msr & MSR_TS_MASK) >> (MSR_TS_S_LG - 1)) + << CR0_SHIFT); + + preempt_disable(); + tm_enable(); + texasr = mfspr(SPRN_TEXASR); + kvmppc_save_tm_pr(vcpu); + kvmppc_copyfrom_vcpu_tm(vcpu); + + /* failure recording depends on Failure Summary bit */ + if (!(texasr & TEXASR_FS)) { + texasr &= ~TEXASR_FC; + texasr |= ((u64)fc_val << TEXASR_FC_LG) | TEXASR_FS; + + texasr &= ~(TEXASR_PR | TEXASR_HV); + if (kvmppc_get_msr(vcpu) & MSR_PR) + texasr |= TEXASR_PR; + + if (kvmppc_get_msr(vcpu) & MSR_HV) + texasr |= TEXASR_HV; + + vcpu->arch.texasr = texasr; + vcpu->arch.tfiar = kvmppc_get_pc(vcpu); + mtspr(SPRN_TEXASR, texasr); + mtspr(SPRN_TFIAR, vcpu->arch.tfiar); + } + tm_disable(); + /* + * treclaim need quit to non-transactional state. + */ + guest_msr &= ~(MSR_TS_MASK); + kvmppc_set_msr(vcpu, guest_msr); + preempt_enable(); + + if (vcpu->arch.shadow_fscr & FSCR_TAR) + mtspr(SPRN_TAR, vcpu->arch.tar); +} + +static void kvmppc_emulate_trchkpt(struct kvm_vcpu *vcpu) +{ + unsigned long guest_msr = kvmppc_get_msr(vcpu); + + preempt_disable(); + /* + * need flush FP/VEC/VSX to vcpu save area before + * copy. + */ + kvmppc_giveup_ext(vcpu, MSR_VSX); + kvmppc_giveup_fac(vcpu, FSCR_TAR_LG); + kvmppc_copyto_vcpu_tm(vcpu); + kvmppc_save_tm_sprs(vcpu); + + /* + * as a result of trecheckpoint. set TS to suspended. + */ + guest_msr &= ~(MSR_TS_MASK); + guest_msr |= MSR_TS_S; + kvmppc_set_msr(vcpu, guest_msr); + kvmppc_restore_tm_pr(vcpu); + preempt_enable(); +} + +/* emulate tabort. at guest privilege state */ +void kvmppc_emulate_tabort(struct kvm_vcpu *vcpu, int ra_val) +{ + /* currently we only emulate tabort. but no emulation of other + * tabort variants since there is no kernel usage of them at + * present. + */ + unsigned long guest_msr = kvmppc_get_msr(vcpu); + uint64_t org_texasr; + + preempt_disable(); + tm_enable(); + org_texasr = mfspr(SPRN_TEXASR); + tm_abort(ra_val); + + /* CR0 = 0 | MSR[TS] | 0 */ + vcpu->arch.cr = (vcpu->arch.cr & ~(CR0_MASK << CR0_SHIFT)) | + (((guest_msr & MSR_TS_MASK) >> (MSR_TS_S_LG - 1)) + << CR0_SHIFT); + + vcpu->arch.texasr = mfspr(SPRN_TEXASR); + /* failure recording depends on Failure Summary bit, + * and tabort will be treated as nops in non-transactional + * state. + */ + if (!(org_texasr & TEXASR_FS) && + MSR_TM_ACTIVE(guest_msr)) { + vcpu->arch.texasr &= ~(TEXASR_PR | TEXASR_HV); + if (guest_msr & MSR_PR) + vcpu->arch.texasr |= TEXASR_PR; + + if (guest_msr & MSR_HV) + vcpu->arch.texasr |= TEXASR_HV; + + vcpu->arch.tfiar = kvmppc_get_pc(vcpu); + } + tm_disable(); + preempt_enable(); +} + +#endif + int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int inst, int *advance) { @@ -117,11 +276,28 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, case 19: switch (get_xop(inst)) { case OP_19_XOP_RFID: - case OP_19_XOP_RFI: + case OP_19_XOP_RFI: { + unsigned long srr1 = kvmppc_get_srr1(vcpu); +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + unsigned long cur_msr = kvmppc_get_msr(vcpu); + + /* + * add rules to fit in ISA specification regarding TM + * state transistion in TM disable/Suspended state, + * and target TM state is TM inactive(00) state. (the + * change should be suppressed). + */ + if (((cur_msr & MSR_TM) == 0) && + ((srr1 & MSR_TM) == 0) && + MSR_TM_SUSPENDED(cur_msr) && + !MSR_TM_ACTIVE(srr1)) + srr1 |= MSR_TS_S; +#endif kvmppc_set_pc(vcpu, kvmppc_get_srr0(vcpu)); - kvmppc_set_msr(vcpu, kvmppc_get_srr1(vcpu)); + kvmppc_set_msr(vcpu, srr1); *advance = 0; break; + } default: emulated = EMULATE_FAIL; @@ -304,6 +480,140 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, break; } +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + case OP_31_XOP_TBEGIN: + { + if (!cpu_has_feature(CPU_FTR_TM)) + break; + + if (!(kvmppc_get_msr(vcpu) & MSR_TM)) { + kvmppc_trigger_fac_interrupt(vcpu, FSCR_TM_LG); + emulated = EMULATE_AGAIN; + break; + } + + if (!(kvmppc_get_msr(vcpu) & MSR_PR)) { + preempt_disable(); + vcpu->arch.cr = (CR0_TBEGIN_FAILURE | + (vcpu->arch.cr & ~(CR0_MASK << CR0_SHIFT))); + + vcpu->arch.texasr = (TEXASR_FS | TEXASR_EXACT | + (((u64)(TM_CAUSE_EMULATE | TM_CAUSE_PERSISTENT)) + << TEXASR_FC_LG)); + + if ((inst >> 21) & 0x1) + vcpu->arch.texasr |= TEXASR_ROT; + + if (kvmppc_get_msr(vcpu) & MSR_HV) + vcpu->arch.texasr |= TEXASR_HV; + + vcpu->arch.tfhar = kvmppc_get_pc(vcpu) + 4; + vcpu->arch.tfiar = kvmppc_get_pc(vcpu); + + kvmppc_restore_tm_sprs(vcpu); + preempt_enable(); + } else + emulated = EMULATE_FAIL; + break; + } + case OP_31_XOP_TABORT: + { + ulong guest_msr = kvmppc_get_msr(vcpu); + unsigned long ra_val = 0; + + if (!cpu_has_feature(CPU_FTR_TM)) + break; + + if (!(kvmppc_get_msr(vcpu) & MSR_TM)) { + kvmppc_trigger_fac_interrupt(vcpu, FSCR_TM_LG); + emulated = EMULATE_AGAIN; + break; + } + + /* only emulate for privilege guest, since problem state + * guest can run with TM enabled and we don't expect to + * trap at here for that case. + */ + WARN_ON(guest_msr & MSR_PR); + + if (ra) + ra_val = kvmppc_get_gpr(vcpu, ra); + + kvmppc_emulate_tabort(vcpu, ra_val); + break; + } + case OP_31_XOP_TRECLAIM: + { + ulong guest_msr = kvmppc_get_msr(vcpu); + unsigned long ra_val = 0; + + if (!cpu_has_feature(CPU_FTR_TM)) + break; + + if (!(kvmppc_get_msr(vcpu) & MSR_TM)) { + kvmppc_trigger_fac_interrupt(vcpu, FSCR_TM_LG); + emulated = EMULATE_AGAIN; + break; + } + + /* generate interrupts based on priorities */ + if (guest_msr & MSR_PR) { + /* Privileged Instruction type Program Interrupt */ + kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV); + emulated = EMULATE_AGAIN; + break; + } + + if (!MSR_TM_ACTIVE(guest_msr)) { + /* TM bad thing interrupt */ + kvmppc_core_queue_program(vcpu, SRR1_PROGTM); + emulated = EMULATE_AGAIN; + break; + } + + if (ra) + ra_val = kvmppc_get_gpr(vcpu, ra); + kvmppc_emulate_treclaim(vcpu, ra_val); + break; + } + case OP_31_XOP_TRCHKPT: + { + ulong guest_msr = kvmppc_get_msr(vcpu); + unsigned long texasr; + + if (!cpu_has_feature(CPU_FTR_TM)) + break; + + if (!(kvmppc_get_msr(vcpu) & MSR_TM)) { + kvmppc_trigger_fac_interrupt(vcpu, FSCR_TM_LG); + emulated = EMULATE_AGAIN; + break; + } + + /* generate interrupt based on priorities */ + if (guest_msr & MSR_PR) { + /* Privileged Instruction type Program Intr */ + kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV); + emulated = EMULATE_AGAIN; + break; + } + + tm_enable(); + texasr = mfspr(SPRN_TEXASR); + tm_disable(); + + if (MSR_TM_ACTIVE(guest_msr) || + !(texasr & (TEXASR_FS))) { + /* TM bad thing interrupt */ + kvmppc_core_queue_program(vcpu, SRR1_PROGTM); + emulated = EMULATE_AGAIN; + break; + } + + kvmppc_emulate_trchkpt(vcpu); + break; + } +#endif default: emulated = EMULATE_FAIL; } @@ -465,13 +775,38 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) break; #ifdef CONFIG_PPC_TRANSACTIONAL_MEM case SPRN_TFHAR: - vcpu->arch.tfhar = spr_val; - break; case SPRN_TEXASR: - vcpu->arch.texasr = spr_val; - break; case SPRN_TFIAR: - vcpu->arch.tfiar = spr_val; + if (!cpu_has_feature(CPU_FTR_TM)) + break; + + if (!(kvmppc_get_msr(vcpu) & MSR_TM)) { + kvmppc_trigger_fac_interrupt(vcpu, FSCR_TM_LG); + emulated = EMULATE_AGAIN; + break; + } + + if (MSR_TM_ACTIVE(kvmppc_get_msr(vcpu)) && + !((MSR_TM_SUSPENDED(kvmppc_get_msr(vcpu))) && + (sprn == SPRN_TFHAR))) { + /* it is illegal to mtspr() TM regs in + * other than non-transactional state, with + * the exception of TFHAR in suspend state. + */ + kvmppc_core_queue_program(vcpu, SRR1_PROGTM); + emulated = EMULATE_AGAIN; + break; + } + + tm_enable(); + if (sprn == SPRN_TFHAR) + mtspr(SPRN_TFHAR, spr_val); + else if (sprn == SPRN_TEXASR) + mtspr(SPRN_TEXASR, spr_val); + else + mtspr(SPRN_TFIAR, spr_val); + tm_disable(); + break; #endif #endif @@ -618,13 +953,25 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val break; #ifdef CONFIG_PPC_TRANSACTIONAL_MEM case SPRN_TFHAR: - *spr_val = vcpu->arch.tfhar; - break; case SPRN_TEXASR: - *spr_val = vcpu->arch.texasr; - break; case SPRN_TFIAR: - *spr_val = vcpu->arch.tfiar; + if (!cpu_has_feature(CPU_FTR_TM)) + break; + + if (!(kvmppc_get_msr(vcpu) & MSR_TM)) { + kvmppc_trigger_fac_interrupt(vcpu, FSCR_TM_LG); + emulated = EMULATE_AGAIN; + break; + } + + tm_enable(); + if (sprn == SPRN_TFHAR) + *spr_val = mfspr(SPRN_TFHAR); + else if (sprn == SPRN_TEXASR) + *spr_val = mfspr(SPRN_TEXASR); + else if (sprn == SPRN_TFIAR) + *spr_val = mfspr(SPRN_TFIAR); + tm_disable(); break; #endif #endif diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8858ab8b6ca4..de686b340f4a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -123,6 +123,32 @@ static bool no_mixing_hpt_and_radix; static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); +/* + * RWMR values for POWER8. These control the rate at which PURR + * and SPURR count and should be set according to the number of + * online threads in the vcore being run. + */ +#define RWMR_RPA_P8_1THREAD 0x164520C62609AECA +#define RWMR_RPA_P8_2THREAD 0x7FFF2908450D8DA9 +#define RWMR_RPA_P8_3THREAD 0x164520C62609AECA +#define RWMR_RPA_P8_4THREAD 0x199A421245058DA9 +#define RWMR_RPA_P8_5THREAD 0x164520C62609AECA +#define RWMR_RPA_P8_6THREAD 0x164520C62609AECA +#define RWMR_RPA_P8_7THREAD 0x164520C62609AECA +#define RWMR_RPA_P8_8THREAD 0x164520C62609AECA + +static unsigned long p8_rwmr_values[MAX_SMT_THREADS + 1] = { + RWMR_RPA_P8_1THREAD, + RWMR_RPA_P8_1THREAD, + RWMR_RPA_P8_2THREAD, + RWMR_RPA_P8_3THREAD, + RWMR_RPA_P8_4THREAD, + RWMR_RPA_P8_5THREAD, + RWMR_RPA_P8_6THREAD, + RWMR_RPA_P8_7THREAD, + RWMR_RPA_P8_8THREAD, +}; + static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc, int *ip) { @@ -371,13 +397,13 @@ static void kvmppc_dump_regs(struct kvm_vcpu *vcpu) pr_err("vcpu %p (%d):\n", vcpu, vcpu->vcpu_id); pr_err("pc = %.16lx msr = %.16llx trap = %x\n", - vcpu->arch.pc, vcpu->arch.shregs.msr, vcpu->arch.trap); + vcpu->arch.regs.nip, vcpu->arch.shregs.msr, vcpu->arch.trap); for (r = 0; r < 16; ++r) pr_err("r%2d = %.16lx r%d = %.16lx\n", r, kvmppc_get_gpr(vcpu, r), r+16, kvmppc_get_gpr(vcpu, r+16)); pr_err("ctr = %.16lx lr = %.16lx\n", - vcpu->arch.ctr, vcpu->arch.lr); + vcpu->arch.regs.ctr, vcpu->arch.regs.link); pr_err("srr0 = %.16llx srr1 = %.16llx\n", vcpu->arch.shregs.srr0, vcpu->arch.shregs.srr1); pr_err("sprg0 = %.16llx sprg1 = %.16llx\n", @@ -385,7 +411,7 @@ static void kvmppc_dump_regs(struct kvm_vcpu *vcpu) pr_err("sprg2 = %.16llx sprg3 = %.16llx\n", vcpu->arch.shregs.sprg2, vcpu->arch.shregs.sprg3); pr_err("cr = %.8x xer = %.16lx dsisr = %.8x\n", - vcpu->arch.cr, vcpu->arch.xer, vcpu->arch.shregs.dsisr); + vcpu->arch.cr, vcpu->arch.regs.xer, vcpu->arch.shregs.dsisr); pr_err("dar = %.16llx\n", vcpu->arch.shregs.dar); pr_err("fault dar = %.16lx dsisr = %.8x\n", vcpu->arch.fault_dar, vcpu->arch.fault_dsisr); @@ -1526,6 +1552,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, *val = get_reg_val(id, vcpu->arch.dec_expires + vcpu->arch.vcore->tb_offset); break; + case KVM_REG_PPC_ONLINE: + *val = get_reg_val(id, vcpu->arch.online); + break; default: r = -EINVAL; break; @@ -1757,6 +1786,14 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, vcpu->arch.dec_expires = set_reg_val(id, *val) - vcpu->arch.vcore->tb_offset; break; + case KVM_REG_PPC_ONLINE: + i = set_reg_val(id, *val); + if (i && !vcpu->arch.online) + atomic_inc(&vcpu->arch.vcore->online_count); + else if (!i && vcpu->arch.online) + atomic_dec(&vcpu->arch.vcore->online_count); + vcpu->arch.online = i; + break; default: r = -EINVAL; break; @@ -2850,6 +2887,25 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) } } + /* + * On POWER8, set RWMR register. + * Since it only affects PURR and SPURR, it doesn't affect + * the host, so we don't save/restore the host value. + */ + if (is_power8) { + unsigned long rwmr_val = RWMR_RPA_P8_8THREAD; + int n_online = atomic_read(&vc->online_count); + + /* + * Use the 8-thread value if we're doing split-core + * or if the vcore's online count looks bogus. + */ + if (split == 1 && threads_per_subcore == MAX_SMT_THREADS && + n_online >= 1 && n_online <= MAX_SMT_THREADS) + rwmr_val = p8_rwmr_values[n_online]; + mtspr(SPRN_RWMR, rwmr_val); + } + /* Start all the threads */ active = 0; for (sub = 0; sub < core_info.n_subcores; ++sub) { @@ -2902,6 +2958,32 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) for (sub = 0; sub < core_info.n_subcores; ++sub) spin_unlock(&core_info.vc[sub]->lock); + if (kvm_is_radix(vc->kvm)) { + int tmp = pcpu; + + /* + * Do we need to flush the process scoped TLB for the LPAR? + * + * On POWER9, individual threads can come in here, but the + * TLB is shared between the 4 threads in a core, hence + * invalidating on one thread invalidates for all. + * Thus we make all 4 threads use the same bit here. + * + * Hash must be flushed in realmode in order to use tlbiel. + */ + mtspr(SPRN_LPID, vc->kvm->arch.lpid); + isync(); + + if (cpu_has_feature(CPU_FTR_ARCH_300)) + tmp &= ~0x3UL; + + if (cpumask_test_cpu(tmp, &vc->kvm->arch.need_tlb_flush)) { + radix__local_flush_tlb_lpid_guest(vc->kvm->arch.lpid); + /* Clear the bit after the TLB flush */ + cpumask_clear_cpu(tmp, &vc->kvm->arch.need_tlb_flush); + } + } + /* * Interrupts will be enabled once we get into the guest, * so tell lockdep that we're about to enable interrupts. @@ -3356,6 +3438,15 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu) } #endif + /* + * Force online to 1 for the sake of old userspace which doesn't + * set it. + */ + if (!vcpu->arch.online) { + atomic_inc(&vcpu->arch.vcore->online_count); + vcpu->arch.online = 1; + } + kvmppc_core_prepare_to_enter(vcpu); /* No need to go into the guest when all we'll do is come back out */ diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index de18299f92b7..d4a3f4da409b 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -18,6 +18,7 @@ #include <linux/cma.h> #include <linux/bitops.h> +#include <asm/asm-prototypes.h> #include <asm/cputable.h> #include <asm/kvm_ppc.h> #include <asm/kvm_book3s.h> @@ -211,9 +212,9 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu) /* Only need to do the expensive mfmsr() on radix */ if (kvm_is_radix(vcpu->kvm) && (mfmsr() & MSR_IR)) - r = powernv_get_random_long(&vcpu->arch.gpr[4]); + r = powernv_get_random_long(&vcpu->arch.regs.gpr[4]); else - r = powernv_get_random_real_mode(&vcpu->arch.gpr[4]); + r = powernv_get_random_real_mode(&vcpu->arch.regs.gpr[4]); if (r) return H_SUCCESS; @@ -562,7 +563,7 @@ unsigned long kvmppc_rm_h_xirr_x(struct kvm_vcpu *vcpu) { if (!kvmppc_xics_enabled(vcpu)) return H_TOO_HARD; - vcpu->arch.gpr[5] = get_tb(); + vcpu->arch.regs.gpr[5] = get_tb(); if (xive_enabled()) { if (is_rm()) return xive_rm_h_xirr(vcpu); @@ -633,7 +634,19 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr) void kvmppc_bad_interrupt(struct pt_regs *regs) { - die("Bad interrupt in KVM entry/exit code", regs, SIGABRT); + /* + * 100 could happen at any time, 200 can happen due to invalid real + * address access for example (or any time due to a hardware problem). + */ + if (TRAP(regs) == 0x100) { + get_paca()->in_nmi++; + system_reset_exception(regs); + get_paca()->in_nmi--; + } else if (TRAP(regs) == 0x200) { + machine_check_exception(regs); + } else { + die("Bad interrupt in KVM entry/exit code", regs, SIGABRT); + } panic("Bad KVM trap"); } diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S b/arch/powerpc/kvm/book3s_hv_interrupts.S index 0e8493033288..82f2ff9410b6 100644 --- a/arch/powerpc/kvm/book3s_hv_interrupts.S +++ b/arch/powerpc/kvm/book3s_hv_interrupts.S @@ -137,7 +137,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) /* * We return here in virtual mode after the guest exits * with something that we can't handle in real mode. - * Interrupts are enabled again at this point. + * Interrupts are still hard-disabled. */ /* diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 78e6a392330f..1f22d9e977d4 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -418,7 +418,8 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel) { return kvmppc_do_h_enter(vcpu->kvm, flags, pte_index, pteh, ptel, - vcpu->arch.pgdir, true, &vcpu->arch.gpr[4]); + vcpu->arch.pgdir, true, + &vcpu->arch.regs.gpr[4]); } #ifdef __BIG_ENDIAN__ @@ -434,24 +435,6 @@ static inline int is_mmio_hpte(unsigned long v, unsigned long r) (HPTE_R_KEY_HI | HPTE_R_KEY_LO)); } -static inline int try_lock_tlbie(unsigned int *lock) -{ - unsigned int tmp, old; - unsigned int token = LOCK_TOKEN; - - asm volatile("1:lwarx %1,0,%2\n" - " cmpwi cr0,%1,0\n" - " bne 2f\n" - " stwcx. %3,0,%2\n" - " bne- 1b\n" - " isync\n" - "2:" - : "=&r" (tmp), "=&r" (old) - : "r" (lock), "r" (token) - : "cc", "memory"); - return old == 0; -} - static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues, long npages, int global, bool need_sync) { @@ -463,8 +446,6 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues, * the RS field, this is backwards-compatible with P7 and P8. */ if (global) { - while (!try_lock_tlbie(&kvm->arch.tlbie_lock)) - cpu_relax(); if (need_sync) asm volatile("ptesync" : : : "memory"); for (i = 0; i < npages; ++i) { @@ -483,7 +464,6 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues, } asm volatile("eieio; tlbsync; ptesync" : : : "memory"); - kvm->arch.tlbie_lock = 0; } else { if (need_sync) asm volatile("ptesync" : : : "memory"); @@ -561,13 +541,13 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags, unsigned long pte_index, unsigned long avpn) { return kvmppc_do_h_remove(vcpu->kvm, flags, pte_index, avpn, - &vcpu->arch.gpr[4]); + &vcpu->arch.regs.gpr[4]); } long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) { struct kvm *kvm = vcpu->kvm; - unsigned long *args = &vcpu->arch.gpr[4]; + unsigned long *args = &vcpu->arch.regs.gpr[4]; __be64 *hp, *hptes[4]; unsigned long tlbrb[4]; long int i, j, k, n, found, indexes[4]; @@ -787,8 +767,8 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags, r = rev[i].guest_rpte | (r & (HPTE_R_R | HPTE_R_C)); r &= ~HPTE_GR_RESERVED; } - vcpu->arch.gpr[4 + i * 2] = v; - vcpu->arch.gpr[5 + i * 2] = r; + vcpu->arch.regs.gpr[4 + i * 2] = v; + vcpu->arch.regs.gpr[5 + i * 2] = r; } return H_SUCCESS; } @@ -834,7 +814,7 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags, } } } - vcpu->arch.gpr[4] = gr; + vcpu->arch.regs.gpr[4] = gr; ret = H_SUCCESS; out: unlock_hpte(hpte, v & ~HPTE_V_HVLOCK); @@ -881,7 +861,7 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags, kvmppc_set_dirty_from_hpte(kvm, v, gr); } } - vcpu->arch.gpr[4] = gr; + vcpu->arch.regs.gpr[4] = gr; ret = H_SUCCESS; out: unlock_hpte(hpte, v & ~HPTE_V_HVLOCK); diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 2a862618f072..758d1d23215e 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -517,7 +517,7 @@ unsigned long xics_rm_h_xirr(struct kvm_vcpu *vcpu) } while (!icp_rm_try_update(icp, old_state, new_state)); /* Return the result in GPR4 */ - vcpu->arch.gpr[4] = xirr; + vcpu->arch.regs.gpr[4] = xirr; return check_too_hard(xics, icp); } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b97d261d3b89..153988d878e8 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -39,8 +39,6 @@ BEGIN_FTR_SECTION; \ extsw reg, reg; \ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) -#define VCPU_GPRS_TM(reg) (((reg) * ULONG_SIZE) + VCPU_GPR_TM) - /* Values in HSTATE_NAPPING(r13) */ #define NAPPING_CEDE 1 #define NAPPING_NOVCPU 2 @@ -639,6 +637,10 @@ kvmppc_hv_entry: /* Primary thread switches to guest partition. */ cmpwi r6,0 bne 10f + + /* Radix has already switched LPID and flushed core TLB */ + bne cr7, 22f + lwz r7,KVM_LPID(r9) BEGIN_FTR_SECTION ld r6,KVM_SDR1(r9) @@ -650,7 +652,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) mtspr SPRN_LPID,r7 isync - /* See if we need to flush the TLB */ + /* See if we need to flush the TLB. Hash has to be done in RM */ lhz r6,PACAPACAINDEX(r13) /* test_bit(cpu, need_tlb_flush) */ BEGIN_FTR_SECTION /* @@ -677,15 +679,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) li r7,0x800 /* IS field = 0b10 */ ptesync li r0,0 /* RS for P9 version of tlbiel */ - bne cr7, 29f 28: tlbiel r7 /* On P9, rs=0, RIC=0, PRS=0, R=0 */ addi r7,r7,0x1000 bdnz 28b - b 30f -29: PPC_TLBIEL(7,0,2,1,1) /* for radix, RIC=2, PRS=1, R=1 */ - addi r7,r7,0x1000 - bdnz 29b -30: ptesync + ptesync 23: ldarx r7,0,r6 /* clear the bit after TLB flushed */ andc r7,r7,r8 stdcx. r7,0,r6 @@ -799,7 +796,10 @@ END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_HV_ASSIST, 0) /* * NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR */ - bl kvmppc_restore_tm + mr r3, r4 + ld r4, VCPU_MSR(r3) + bl kvmppc_restore_tm_hv + ld r4, HSTATE_KVM_VCPU(r13) 91: #endif @@ -1783,7 +1783,10 @@ END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_HV_ASSIST, 0) /* * NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR */ - bl kvmppc_save_tm + mr r3, r9 + ld r4, VCPU_MSR(r3) + bl kvmppc_save_tm_hv + ld r9, HSTATE_KVM_VCPU(r13) 91: #endif @@ -2686,8 +2689,9 @@ END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_HV_ASSIST, 0) /* * NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR */ - ld r9, HSTATE_KVM_VCPU(r13) - bl kvmppc_save_tm + ld r3, HSTATE_KVM_VCPU(r13) + ld r4, VCPU_MSR(r3) + bl kvmppc_save_tm_hv 91: #endif @@ -2805,7 +2809,10 @@ END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_HV_ASSIST, 0) /* * NOTE THAT THIS TRASHES ALL NON-VOLATILE REGISTERS INCLUDING CR */ - bl kvmppc_restore_tm + mr r3, r4 + ld r4, VCPU_MSR(r3) + bl kvmppc_restore_tm_hv + ld r4, HSTATE_KVM_VCPU(r13) 91: #endif @@ -3126,11 +3133,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #ifdef CONFIG_PPC_TRANSACTIONAL_MEM /* * Save transactional state and TM-related registers. - * Called with r9 pointing to the vcpu struct. + * Called with r3 pointing to the vcpu struct and r4 containing + * the guest MSR value. * This can modify all checkpointed registers, but - * restores r1, r2 and r9 (vcpu pointer) before exit. + * restores r1 and r2 before exit. */ -kvmppc_save_tm: +kvmppc_save_tm_hv: + /* See if we need to handle fake suspend mode */ +BEGIN_FTR_SECTION + b __kvmppc_save_tm +END_FTR_SECTION_IFCLR(CPU_FTR_P9_TM_HV_ASSIST) + + lbz r0, HSTATE_FAKE_SUSPEND(r13) /* Were we fake suspended? */ + cmpwi r0, 0 + beq __kvmppc_save_tm + + /* The following code handles the fake_suspend = 1 case */ mflr r0 std r0, PPC_LR_STKOFF(r1) stdu r1, -PPC_MIN_STKFRM(r1) @@ -3141,59 +3159,37 @@ kvmppc_save_tm: rldimi r8, r0, MSR_TM_LG, 63-MSR_TM_LG mtmsrd r8 - ld r5, VCPU_MSR(r9) - rldicl. r5, r5, 64 - MSR_TS_S_LG, 62 - beq 1f /* TM not active in guest. */ - - std r1, HSTATE_HOST_R1(r13) - li r3, TM_CAUSE_KVM_RESCHED - -BEGIN_FTR_SECTION - lbz r0, HSTATE_FAKE_SUSPEND(r13) /* Were we fake suspended? */ - cmpwi r0, 0 - beq 3f rldicl. r8, r8, 64 - MSR_TS_S_LG, 62 /* Did we actually hrfid? */ beq 4f -BEGIN_FTR_SECTION_NESTED(96) +BEGIN_FTR_SECTION bl pnv_power9_force_smt4_catch -END_FTR_SECTION_NESTED(CPU_FTR_P9_TM_XER_SO_BUG, CPU_FTR_P9_TM_XER_SO_BUG, 96) +END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_XER_SO_BUG) nop - b 6f -3: - /* Emulation of the treclaim instruction needs TEXASR before treclaim */ - mfspr r6, SPRN_TEXASR - std r6, VCPU_ORIG_TEXASR(r9) -6: -END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_HV_ASSIST) - /* Clear the MSR RI since r1, r13 are all going to be foobar. */ + std r1, HSTATE_HOST_R1(r13) + + /* Clear the MSR RI since r1, r13 may be foobar. */ li r5, 0 mtmsrd r5, 1 - /* All GPRs are volatile at this point. */ + /* We have to treclaim here because that's the only way to do S->N */ + li r3, TM_CAUSE_KVM_RESCHED TRECLAIM(R3) - /* Temporarily store r13 and r9 so we have some regs to play with */ - SET_SCRATCH0(r13) - GET_PACA(r13) - std r9, PACATMSCRATCH(r13) - - /* If doing TM emulation on POWER9 DD2.2, check for fake suspend mode */ -BEGIN_FTR_SECTION - lbz r9, HSTATE_FAKE_SUSPEND(r13) - cmpwi r9, 0 - beq 2f /* * We were in fake suspend, so we are not going to save the * register state as the guest checkpointed state (since * we already have it), therefore we can now use any volatile GPR. */ - /* Reload stack pointer and TOC. */ + /* Reload PACA pointer, stack pointer and TOC. */ + GET_PACA(r13) ld r1, HSTATE_HOST_R1(r13) ld r2, PACATOC(r13) + /* Set MSR RI now we have r1 and r13 back. */ li r5, MSR_RI mtmsrd r5, 1 + HMT_MEDIUM ld r6, HSTATE_DSCR(r13) mtspr SPRN_DSCR, r6 @@ -3208,85 +3204,9 @@ END_FTR_SECTION_NESTED(CPU_FTR_P9_TM_XER_SO_BUG, CPU_FTR_P9_TM_XER_SO_BUG, 96) li r0, PSSCR_FAKE_SUSPEND andc r3, r3, r0 mtspr SPRN_PSSCR, r3 - ld r9, HSTATE_KVM_VCPU(r13) - /* Don't save TEXASR, use value from last exit in real suspend state */ - b 11f -2: -END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_HV_ASSIST) + /* Don't save TEXASR, use value from last exit in real suspend state */ ld r9, HSTATE_KVM_VCPU(r13) - - /* Get a few more GPRs free. */ - std r29, VCPU_GPRS_TM(29)(r9) - std r30, VCPU_GPRS_TM(30)(r9) - std r31, VCPU_GPRS_TM(31)(r9) - - /* Save away PPR and DSCR soon so don't run with user values. */ - mfspr r31, SPRN_PPR - HMT_MEDIUM - mfspr r30, SPRN_DSCR - ld r29, HSTATE_DSCR(r13) - mtspr SPRN_DSCR, r29 - - /* Save all but r9, r13 & r29-r31 */ - reg = 0 - .rept 29 - .if (reg != 9) && (reg != 13) - std reg, VCPU_GPRS_TM(reg)(r9) - .endif - reg = reg + 1 - .endr - /* ... now save r13 */ - GET_SCRATCH0(r4) - std r4, VCPU_GPRS_TM(13)(r9) - /* ... and save r9 */ - ld r4, PACATMSCRATCH(r13) - std r4, VCPU_GPRS_TM(9)(r9) - - /* Reload stack pointer and TOC. */ - ld r1, HSTATE_HOST_R1(r13) - ld r2, PACATOC(r13) - - /* Set MSR RI now we have r1 and r13 back. */ - li r5, MSR_RI - mtmsrd r5, 1 - - /* Save away checkpinted SPRs. */ - std r31, VCPU_PPR_TM(r9) - std r30, VCPU_DSCR_TM(r9) - mflr r5 - mfcr r6 - mfctr r7 - mfspr r8, SPRN_AMR - mfspr r10, SPRN_TAR - mfxer r11 - std r5, VCPU_LR_TM(r9) - stw r6, VCPU_CR_TM(r9) - std r7, VCPU_CTR_TM(r9) - std r8, VCPU_AMR_TM(r9) - std r10, VCPU_TAR_TM(r9) - std r11, VCPU_XER_TM(r9) - - /* Restore r12 as trap number. */ - lwz r12, VCPU_TRAP(r9) - - /* Save FP/VSX. */ - addi r3, r9, VCPU_FPRS_TM - bl store_fp_state - addi r3, r9, VCPU_VRS_TM - bl store_vr_state - mfspr r6, SPRN_VRSAVE - stw r6, VCPU_VRSAVE_TM(r9) -1: - /* - * We need to save these SPRs after the treclaim so that the software - * error code is recorded correctly in the TEXASR. Also the user may - * change these outside of a transaction, so they must always be - * context switched. - */ - mfspr r7, SPRN_TEXASR - std r7, VCPU_TEXASR(r9) -11: mfspr r5, SPRN_TFHAR mfspr r6, SPRN_TFIAR std r5, VCPU_TFHAR(r9) @@ -3299,149 +3219,63 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_HV_ASSIST) /* * Restore transactional state and TM-related registers. - * Called with r4 pointing to the vcpu struct. + * Called with r3 pointing to the vcpu struct + * and r4 containing the guest MSR value. * This potentially modifies all checkpointed registers. - * It restores r1, r2, r4 from the PACA. + * It restores r1 and r2 from the PACA. */ -kvmppc_restore_tm: +kvmppc_restore_tm_hv: + /* + * If we are doing TM emulation for the guest on a POWER9 DD2, + * then we don't actually do a trechkpt -- we either set up + * fake-suspend mode, or emulate a TM rollback. + */ +BEGIN_FTR_SECTION + b __kvmppc_restore_tm +END_FTR_SECTION_IFCLR(CPU_FTR_P9_TM_HV_ASSIST) mflr r0 std r0, PPC_LR_STKOFF(r1) - /* Turn on TM/FP/VSX/VMX so we can restore them. */ + li r0, 0 + stb r0, HSTATE_FAKE_SUSPEND(r13) + + /* Turn on TM so we can restore TM SPRs */ mfmsr r5 - li r6, MSR_TM >> 32 - sldi r6, r6, 32 - or r5, r5, r6 - ori r5, r5, MSR_FP - oris r5, r5, (MSR_VEC | MSR_VSX)@h + li r0, 1 + rldimi r5, r0, MSR_TM_LG, 63-MSR_TM_LG mtmsrd r5 /* * The user may change these outside of a transaction, so they must * always be context switched. */ - ld r5, VCPU_TFHAR(r4) - ld r6, VCPU_TFIAR(r4) - ld r7, VCPU_TEXASR(r4) + ld r5, VCPU_TFHAR(r3) + ld r6, VCPU_TFIAR(r3) + ld r7, VCPU_TEXASR(r3) mtspr SPRN_TFHAR, r5 mtspr SPRN_TFIAR, r6 mtspr SPRN_TEXASR, r7 - li r0, 0 - stb r0, HSTATE_FAKE_SUSPEND(r13) - ld r5, VCPU_MSR(r4) - rldicl. r5, r5, 64 - MSR_TS_S_LG, 62 + rldicl. r5, r4, 64 - MSR_TS_S_LG, 62 beqlr /* TM not active in guest */ - std r1, HSTATE_HOST_R1(r13) - /* Make sure the failure summary is set, otherwise we'll program check - * when we trechkpt. It's possible that this might have been not set - * on a kvmppc_set_one_reg() call but we shouldn't let this crash the - * host. - */ + /* Make sure the failure summary is set */ oris r7, r7, (TEXASR_FS)@h mtspr SPRN_TEXASR, r7 - /* - * If we are doing TM emulation for the guest on a POWER9 DD2, - * then we don't actually do a trechkpt -- we either set up - * fake-suspend mode, or emulate a TM rollback. - */ -BEGIN_FTR_SECTION - b .Ldo_tm_fake_load -END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_HV_ASSIST) - - /* - * We need to load up the checkpointed state for the guest. - * We need to do this early as it will blow away any GPRs, VSRs and - * some SPRs. - */ - - mr r31, r4 - addi r3, r31, VCPU_FPRS_TM - bl load_fp_state - addi r3, r31, VCPU_VRS_TM - bl load_vr_state - mr r4, r31 - lwz r7, VCPU_VRSAVE_TM(r4) - mtspr SPRN_VRSAVE, r7 - - ld r5, VCPU_LR_TM(r4) - lwz r6, VCPU_CR_TM(r4) - ld r7, VCPU_CTR_TM(r4) - ld r8, VCPU_AMR_TM(r4) - ld r9, VCPU_TAR_TM(r4) - ld r10, VCPU_XER_TM(r4) - mtlr r5 - mtcr r6 - mtctr r7 - mtspr SPRN_AMR, r8 - mtspr SPRN_TAR, r9 - mtxer r10 - - /* - * Load up PPR and DSCR values but don't put them in the actual SPRs - * till the last moment to avoid running with userspace PPR and DSCR for - * too long. - */ - ld r29, VCPU_DSCR_TM(r4) - ld r30, VCPU_PPR_TM(r4) - - std r2, PACATMSCRATCH(r13) /* Save TOC */ - - /* Clear the MSR RI since r1, r13 are all going to be foobar. */ - li r5, 0 - mtmsrd r5, 1 - - /* Load GPRs r0-r28 */ - reg = 0 - .rept 29 - ld reg, VCPU_GPRS_TM(reg)(r31) - reg = reg + 1 - .endr - - mtspr SPRN_DSCR, r29 - mtspr SPRN_PPR, r30 - - /* Load final GPRs */ - ld 29, VCPU_GPRS_TM(29)(r31) - ld 30, VCPU_GPRS_TM(30)(r31) - ld 31, VCPU_GPRS_TM(31)(r31) - - /* TM checkpointed state is now setup. All GPRs are now volatile. */ - TRECHKPT - - /* Now let's get back the state we need. */ - HMT_MEDIUM - GET_PACA(r13) - ld r29, HSTATE_DSCR(r13) - mtspr SPRN_DSCR, r29 - ld r4, HSTATE_KVM_VCPU(r13) - ld r1, HSTATE_HOST_R1(r13) - ld r2, PACATMSCRATCH(r13) - - /* Set the MSR RI since we have our registers back. */ - li r5, MSR_RI - mtmsrd r5, 1 -9: - ld r0, PPC_LR_STKOFF(r1) - mtlr r0 - blr - -.Ldo_tm_fake_load: cmpwi r5, 1 /* check for suspended state */ bgt 10f stb r5, HSTATE_FAKE_SUSPEND(r13) - b 9b /* and return */ + b 9f /* and return */ 10: stdu r1, -PPC_MIN_STKFRM(r1) /* guest is in transactional state, so simulate rollback */ - mr r3, r4 bl kvmhv_emulate_tm_rollback nop - ld r4, HSTATE_KVM_VCPU(r13) /* our vcpu pointer has been trashed */ addi r1, r1, PPC_MIN_STKFRM - b 9b -#endif +9: ld r0, PPC_LR_STKOFF(r1) + mtlr r0 + blr +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ /* * We come here if we get any exception or interrupt while we are @@ -3572,6 +3406,8 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX) bcl 20, 31, .+4 5: mflr r3 addi r3, r3, 9f - 5b + li r4, -1 + rldimi r3, r4, 62, 0 /* ensure 0xc000000000000000 bits are set */ ld r4, PACAKMSR(r13) mtspr SPRN_SRR0, r3 mtspr SPRN_SRR1, r4 diff --git a/arch/powerpc/kvm/book3s_hv_tm.c b/arch/powerpc/kvm/book3s_hv_tm.c index bf710ad3a6d7..008285058f9b 100644 --- a/arch/powerpc/kvm/book3s_hv_tm.c +++ b/arch/powerpc/kvm/book3s_hv_tm.c @@ -19,7 +19,7 @@ static void emulate_tx_failure(struct kvm_vcpu *vcpu, u64 failure_cause) u64 texasr, tfiar; u64 msr = vcpu->arch.shregs.msr; - tfiar = vcpu->arch.pc & ~0x3ull; + tfiar = vcpu->arch.regs.nip & ~0x3ull; texasr = (failure_cause << 56) | TEXASR_ABORT | TEXASR_FS | TEXASR_EXACT; if (MSR_TM_SUSPENDED(vcpu->arch.shregs.msr)) texasr |= TEXASR_SUSP; @@ -57,8 +57,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu) (newmsr & MSR_TM))); newmsr = sanitize_msr(newmsr); vcpu->arch.shregs.msr = newmsr; - vcpu->arch.cfar = vcpu->arch.pc - 4; - vcpu->arch.pc = vcpu->arch.shregs.srr0; + vcpu->arch.cfar = vcpu->arch.regs.nip - 4; + vcpu->arch.regs.nip = vcpu->arch.shregs.srr0; return RESUME_GUEST; case PPC_INST_RFEBB: @@ -90,8 +90,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu) vcpu->arch.bescr = bescr; msr = (msr & ~MSR_TS_MASK) | MSR_TS_T; vcpu->arch.shregs.msr = msr; - vcpu->arch.cfar = vcpu->arch.pc - 4; - vcpu->arch.pc = vcpu->arch.ebbrr; + vcpu->arch.cfar = vcpu->arch.regs.nip - 4; + vcpu->arch.regs.nip = vcpu->arch.ebbrr; return RESUME_GUEST; case PPC_INST_MTMSRD: diff --git a/arch/powerpc/kvm/book3s_hv_tm_builtin.c b/arch/powerpc/kvm/book3s_hv_tm_builtin.c index d98ccfd2b88c..b2c7c6fca4f9 100644 --- a/arch/powerpc/kvm/book3s_hv_tm_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_tm_builtin.c @@ -35,8 +35,8 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu) return 0; newmsr = sanitize_msr(newmsr); vcpu->arch.shregs.msr = newmsr; - vcpu->arch.cfar = vcpu->arch.pc - 4; - vcpu->arch.pc = vcpu->arch.shregs.srr0; + vcpu->arch.cfar = vcpu->arch.regs.nip - 4; + vcpu->arch.regs.nip = vcpu->arch.shregs.srr0; return 1; case PPC_INST_RFEBB: @@ -58,8 +58,8 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu) mtspr(SPRN_BESCR, bescr); msr = (msr & ~MSR_TS_MASK) | MSR_TS_T; vcpu->arch.shregs.msr = msr; - vcpu->arch.cfar = vcpu->arch.pc - 4; - vcpu->arch.pc = mfspr(SPRN_EBBRR); + vcpu->arch.cfar = vcpu->arch.regs.nip - 4; + vcpu->arch.regs.nip = mfspr(SPRN_EBBRR); return 1; case PPC_INST_MTMSRD: @@ -103,7 +103,7 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu) void kvmhv_emulate_tm_rollback(struct kvm_vcpu *vcpu) { vcpu->arch.shregs.msr &= ~MSR_TS_MASK; /* go to N state */ - vcpu->arch.pc = vcpu->arch.tfhar; + vcpu->arch.regs.nip = vcpu->arch.tfhar; copy_from_checkpoint(vcpu); vcpu->arch.cr = (vcpu->arch.cr & 0x0fffffff) | 0xa0000000; } diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index d3f304d06adf..c3b8006f0eac 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -42,6 +42,8 @@ #include <linux/highmem.h> #include <linux/module.h> #include <linux/miscdevice.h> +#include <asm/asm-prototypes.h> +#include <asm/tm.h> #include "book3s.h" @@ -53,7 +55,9 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr, ulong msr); -static void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac); +#ifdef CONFIG_PPC_BOOK3S_64 +static int kvmppc_handle_fac(struct kvm_vcpu *vcpu, ulong fac); +#endif /* Some compatibility defines */ #ifdef CONFIG_PPC_BOOK3S_32 @@ -114,6 +118,8 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, int cpu) if (kvmppc_is_split_real(vcpu)) kvmppc_fixup_split_real(vcpu); + + kvmppc_restore_tm_pr(vcpu); } static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu) @@ -133,6 +139,7 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu) kvmppc_giveup_ext(vcpu, MSR_FP | MSR_VEC | MSR_VSX); kvmppc_giveup_fac(vcpu, FSCR_TAR_LG); + kvmppc_save_tm_pr(vcpu); /* Enable AIL if supported */ if (cpu_has_feature(CPU_FTR_HVMODE) && @@ -147,25 +154,25 @@ void kvmppc_copy_to_svcpu(struct kvm_vcpu *vcpu) { struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu); - svcpu->gpr[0] = vcpu->arch.gpr[0]; - svcpu->gpr[1] = vcpu->arch.gpr[1]; - svcpu->gpr[2] = vcpu->arch.gpr[2]; - svcpu->gpr[3] = vcpu->arch.gpr[3]; - svcpu->gpr[4] = vcpu->arch.gpr[4]; - svcpu->gpr[5] = vcpu->arch.gpr[5]; - svcpu->gpr[6] = vcpu->arch.gpr[6]; - svcpu->gpr[7] = vcpu->arch.gpr[7]; - svcpu->gpr[8] = vcpu->arch.gpr[8]; - svcpu->gpr[9] = vcpu->arch.gpr[9]; - svcpu->gpr[10] = vcpu->arch.gpr[10]; - svcpu->gpr[11] = vcpu->arch.gpr[11]; - svcpu->gpr[12] = vcpu->arch.gpr[12]; - svcpu->gpr[13] = vcpu->arch.gpr[13]; + svcpu->gpr[0] = vcpu->arch.regs.gpr[0]; + svcpu->gpr[1] = vcpu->arch.regs.gpr[1]; + svcpu->gpr[2] = vcpu->arch.regs.gpr[2]; + svcpu->gpr[3] = vcpu->arch.regs.gpr[3]; + svcpu->gpr[4] = vcpu->arch.regs.gpr[4]; + svcpu->gpr[5] = vcpu->arch.regs.gpr[5]; + svcpu->gpr[6] = vcpu->arch.regs.gpr[6]; + svcpu->gpr[7] = vcpu->arch.regs.gpr[7]; + svcpu->gpr[8] = vcpu->arch.regs.gpr[8]; + svcpu->gpr[9] = vcpu->arch.regs.gpr[9]; + svcpu->gpr[10] = vcpu->arch.regs.gpr[10]; + svcpu->gpr[11] = vcpu->arch.regs.gpr[11]; + svcpu->gpr[12] = vcpu->arch.regs.gpr[12]; + svcpu->gpr[13] = vcpu->arch.regs.gpr[13]; svcpu->cr = vcpu->arch.cr; - svcpu->xer = vcpu->arch.xer; - svcpu->ctr = vcpu->arch.ctr; - svcpu->lr = vcpu->arch.lr; - svcpu->pc = vcpu->arch.pc; + svcpu->xer = vcpu->arch.regs.xer; + svcpu->ctr = vcpu->arch.regs.ctr; + svcpu->lr = vcpu->arch.regs.link; + svcpu->pc = vcpu->arch.regs.nip; #ifdef CONFIG_PPC_BOOK3S_64 svcpu->shadow_fscr = vcpu->arch.shadow_fscr; #endif @@ -182,10 +189,45 @@ void kvmppc_copy_to_svcpu(struct kvm_vcpu *vcpu) svcpu_put(svcpu); } +static void kvmppc_recalc_shadow_msr(struct kvm_vcpu *vcpu) +{ + ulong guest_msr = kvmppc_get_msr(vcpu); + ulong smsr = guest_msr; + + /* Guest MSR values */ +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + smsr &= MSR_FE0 | MSR_FE1 | MSR_SF | MSR_SE | MSR_BE | MSR_LE | + MSR_TM | MSR_TS_MASK; +#else + smsr &= MSR_FE0 | MSR_FE1 | MSR_SF | MSR_SE | MSR_BE | MSR_LE; +#endif + /* Process MSR values */ + smsr |= MSR_ME | MSR_RI | MSR_IR | MSR_DR | MSR_PR | MSR_EE; + /* External providers the guest reserved */ + smsr |= (guest_msr & vcpu->arch.guest_owned_ext); + /* 64-bit Process MSR values */ +#ifdef CONFIG_PPC_BOOK3S_64 + smsr |= MSR_ISF | MSR_HV; +#endif +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + /* + * in guest privileged state, we want to fail all TM transactions. + * So disable MSR TM bit so that all tbegin. will be able to be + * trapped into host. + */ + if (!(guest_msr & MSR_PR)) + smsr &= ~MSR_TM; +#endif + vcpu->arch.shadow_msr = smsr; +} + /* Copy data touched by real-mode code from shadow vcpu back to vcpu */ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu) { struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu); +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + ulong old_msr; +#endif /* * Maybe we were already preempted and synced the svcpu from @@ -194,25 +236,25 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu) if (!svcpu->in_use) goto out; - vcpu->arch.gpr[0] = svcpu->gpr[0]; - vcpu->arch.gpr[1] = svcpu->gpr[1]; - vcpu->arch.gpr[2] = svcpu->gpr[2]; - vcpu->arch.gpr[3] = svcpu->gpr[3]; - vcpu->arch.gpr[4] = svcpu->gpr[4]; - vcpu->arch.gpr[5] = svcpu->gpr[5]; - vcpu->arch.gpr[6] = svcpu->gpr[6]; - vcpu->arch.gpr[7] = svcpu->gpr[7]; - vcpu->arch.gpr[8] = svcpu->gpr[8]; - vcpu->arch.gpr[9] = svcpu->gpr[9]; - vcpu->arch.gpr[10] = svcpu->gpr[10]; - vcpu->arch.gpr[11] = svcpu->gpr[11]; - vcpu->arch.gpr[12] = svcpu->gpr[12]; - vcpu->arch.gpr[13] = svcpu->gpr[13]; + vcpu->arch.regs.gpr[0] = svcpu->gpr[0]; + vcpu->arch.regs.gpr[1] = svcpu->gpr[1]; + vcpu->arch.regs.gpr[2] = svcpu->gpr[2]; + vcpu->arch.regs.gpr[3] = svcpu->gpr[3]; + vcpu->arch.regs.gpr[4] = svcpu->gpr[4]; + vcpu->arch.regs.gpr[5] = svcpu->gpr[5]; + vcpu->arch.regs.gpr[6] = svcpu->gpr[6]; + vcpu->arch.regs.gpr[7] = svcpu->gpr[7]; + vcpu->arch.regs.gpr[8] = svcpu->gpr[8]; + vcpu->arch.regs.gpr[9] = svcpu->gpr[9]; + vcpu->arch.regs.gpr[10] = svcpu->gpr[10]; + vcpu->arch.regs.gpr[11] = svcpu->gpr[11]; + vcpu->arch.regs.gpr[12] = svcpu->gpr[12]; + vcpu->arch.regs.gpr[13] = svcpu->gpr[13]; vcpu->arch.cr = svcpu->cr; - vcpu->arch.xer = svcpu->xer; - vcpu->arch.ctr = svcpu->ctr; - vcpu->arch.lr = svcpu->lr; - vcpu->arch.pc = svcpu->pc; + vcpu->arch.regs.xer = svcpu->xer; + vcpu->arch.regs.ctr = svcpu->ctr; + vcpu->arch.regs.link = svcpu->lr; + vcpu->arch.regs.nip = svcpu->pc; vcpu->arch.shadow_srr1 = svcpu->shadow_srr1; vcpu->arch.fault_dar = svcpu->fault_dar; vcpu->arch.fault_dsisr = svcpu->fault_dsisr; @@ -228,12 +270,116 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu) to_book3s(vcpu)->vtb += get_vtb() - vcpu->arch.entry_vtb; if (cpu_has_feature(CPU_FTR_ARCH_207S)) vcpu->arch.ic += mfspr(SPRN_IC) - vcpu->arch.entry_ic; + +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + /* + * Unlike other MSR bits, MSR[TS]bits can be changed at guest without + * notifying host: + * modified by unprivileged instructions like "tbegin"/"tend"/ + * "tresume"/"tsuspend" in PR KVM guest. + * + * It is necessary to sync here to calculate a correct shadow_msr. + * + * privileged guest's tbegin will be failed at present. So we + * only take care of problem state guest. + */ + old_msr = kvmppc_get_msr(vcpu); + if (unlikely((old_msr & MSR_PR) && + (vcpu->arch.shadow_srr1 & (MSR_TS_MASK)) != + (old_msr & (MSR_TS_MASK)))) { + old_msr &= ~(MSR_TS_MASK); + old_msr |= (vcpu->arch.shadow_srr1 & (MSR_TS_MASK)); + kvmppc_set_msr_fast(vcpu, old_msr); + kvmppc_recalc_shadow_msr(vcpu); + } +#endif + svcpu->in_use = false; out: svcpu_put(svcpu); } +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +void kvmppc_save_tm_sprs(struct kvm_vcpu *vcpu) +{ + tm_enable(); + vcpu->arch.tfhar = mfspr(SPRN_TFHAR); + vcpu->arch.texasr = mfspr(SPRN_TEXASR); + vcpu->arch.tfiar = mfspr(SPRN_TFIAR); + tm_disable(); +} + +void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu) +{ + tm_enable(); + mtspr(SPRN_TFHAR, vcpu->arch.tfhar); + mtspr(SPRN_TEXASR, vcpu->arch.texasr); + mtspr(SPRN_TFIAR, vcpu->arch.tfiar); + tm_disable(); +} + +/* loadup math bits which is enabled at kvmppc_get_msr() but not enabled at + * hardware. + */ +static void kvmppc_handle_lost_math_exts(struct kvm_vcpu *vcpu) +{ + ulong exit_nr; + ulong ext_diff = (kvmppc_get_msr(vcpu) & ~vcpu->arch.guest_owned_ext) & + (MSR_FP | MSR_VEC | MSR_VSX); + + if (!ext_diff) + return; + + if (ext_diff == MSR_FP) + exit_nr = BOOK3S_INTERRUPT_FP_UNAVAIL; + else if (ext_diff == MSR_VEC) + exit_nr = BOOK3S_INTERRUPT_ALTIVEC; + else + exit_nr = BOOK3S_INTERRUPT_VSX; + + kvmppc_handle_ext(vcpu, exit_nr, ext_diff); +} + +void kvmppc_save_tm_pr(struct kvm_vcpu *vcpu) +{ + if (!(MSR_TM_ACTIVE(kvmppc_get_msr(vcpu)))) { + kvmppc_save_tm_sprs(vcpu); + return; + } + + kvmppc_giveup_fac(vcpu, FSCR_TAR_LG); + kvmppc_giveup_ext(vcpu, MSR_VSX); + + preempt_disable(); + _kvmppc_save_tm_pr(vcpu, mfmsr()); + preempt_enable(); +} + +void kvmppc_restore_tm_pr(struct kvm_vcpu *vcpu) +{ + if (!MSR_TM_ACTIVE(kvmppc_get_msr(vcpu))) { + kvmppc_restore_tm_sprs(vcpu); + if (kvmppc_get_msr(vcpu) & MSR_TM) { + kvmppc_handle_lost_math_exts(vcpu); + if (vcpu->arch.fscr & FSCR_TAR) + kvmppc_handle_fac(vcpu, FSCR_TAR_LG); + } + return; + } + + preempt_disable(); + _kvmppc_restore_tm_pr(vcpu, kvmppc_get_msr(vcpu)); + preempt_enable(); + + if (kvmppc_get_msr(vcpu) & MSR_TM) { + kvmppc_handle_lost_math_exts(vcpu); + if (vcpu->arch.fscr & FSCR_TAR) + kvmppc_handle_fac(vcpu, FSCR_TAR_LG); + } +} +#endif + static int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu) { int r = 1; /* Indicate we want to get back into the guest */ @@ -306,32 +452,29 @@ static void kvm_set_spte_hva_pr(struct kvm *kvm, unsigned long hva, pte_t pte) /*****************************************/ -static void kvmppc_recalc_shadow_msr(struct kvm_vcpu *vcpu) -{ - ulong guest_msr = kvmppc_get_msr(vcpu); - ulong smsr = guest_msr; - - /* Guest MSR values */ - smsr &= MSR_FE0 | MSR_FE1 | MSR_SF | MSR_SE | MSR_BE | MSR_LE; - /* Process MSR values */ - smsr |= MSR_ME | MSR_RI | MSR_IR | MSR_DR | MSR_PR | MSR_EE; - /* External providers the guest reserved */ - smsr |= (guest_msr & vcpu->arch.guest_owned_ext); - /* 64-bit Process MSR values */ -#ifdef CONFIG_PPC_BOOK3S_64 - smsr |= MSR_ISF | MSR_HV; -#endif - vcpu->arch.shadow_msr = smsr; -} - static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr) { - ulong old_msr = kvmppc_get_msr(vcpu); + ulong old_msr; + + /* For PAPR guest, make sure MSR reflects guest mode */ + if (vcpu->arch.papr_enabled) + msr = (msr & ~MSR_HV) | MSR_ME; #ifdef EXIT_DEBUG printk(KERN_INFO "KVM: Set MSR to 0x%llx\n", msr); #endif +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + /* We should never target guest MSR to TS=10 && PR=0, + * since we always fail transaction for guest privilege + * state. + */ + if (!(msr & MSR_PR) && MSR_TM_TRANSACTIONAL(msr)) + kvmppc_emulate_tabort(vcpu, + TM_CAUSE_KVM_FAC_UNAV | TM_CAUSE_PERSISTENT); +#endif + + old_msr = kvmppc_get_msr(vcpu); msr &= to_book3s(vcpu)->msr_mask; kvmppc_set_msr_fast(vcpu, msr); kvmppc_recalc_shadow_msr(vcpu); @@ -387,6 +530,11 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr) /* Preload FPU if it's enabled */ if (kvmppc_get_msr(vcpu) & MSR_FP) kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP); + +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + if (kvmppc_get_msr(vcpu) & MSR_TM) + kvmppc_handle_lost_math_exts(vcpu); +#endif } void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr) @@ -584,24 +732,20 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, pte.may_execute = !data; } - if (page_found == -ENOENT) { - /* Page not found in guest PTE entries */ - u64 ssrr1 = vcpu->arch.shadow_srr1; - u64 msr = kvmppc_get_msr(vcpu); - kvmppc_set_dar(vcpu, kvmppc_get_fault_dar(vcpu)); - kvmppc_set_dsisr(vcpu, vcpu->arch.fault_dsisr); - kvmppc_set_msr_fast(vcpu, msr | (ssrr1 & 0xf8000000ULL)); - kvmppc_book3s_queue_irqprio(vcpu, vec); - } else if (page_found == -EPERM) { - /* Storage protection */ - u32 dsisr = vcpu->arch.fault_dsisr; - u64 ssrr1 = vcpu->arch.shadow_srr1; - u64 msr = kvmppc_get_msr(vcpu); - kvmppc_set_dar(vcpu, kvmppc_get_fault_dar(vcpu)); - dsisr = (dsisr & ~DSISR_NOHPTE) | DSISR_PROTFAULT; - kvmppc_set_dsisr(vcpu, dsisr); - kvmppc_set_msr_fast(vcpu, msr | (ssrr1 & 0xf8000000ULL)); - kvmppc_book3s_queue_irqprio(vcpu, vec); + if (page_found == -ENOENT || page_found == -EPERM) { + /* Page not found in guest PTE entries, or protection fault */ + u64 flags; + + if (page_found == -EPERM) + flags = DSISR_PROTFAULT; + else + flags = DSISR_NOHPTE; + if (data) { + flags |= vcpu->arch.fault_dsisr & DSISR_ISSTORE; + kvmppc_core_queue_data_storage(vcpu, eaddr, flags); + } else { + kvmppc_core_queue_inst_storage(vcpu, flags); + } } else if (page_found == -EINVAL) { /* Page not found in guest SLB */ kvmppc_set_dar(vcpu, kvmppc_get_fault_dar(vcpu)); @@ -683,7 +827,7 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr) } /* Give up facility (TAR / EBB / DSCR) */ -static void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac) +void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac) { #ifdef CONFIG_PPC_BOOK3S_64 if (!(vcpu->arch.shadow_fscr & (1ULL << fac))) { @@ -802,7 +946,7 @@ static void kvmppc_handle_lost_ext(struct kvm_vcpu *vcpu) #ifdef CONFIG_PPC_BOOK3S_64 -static void kvmppc_trigger_fac_interrupt(struct kvm_vcpu *vcpu, ulong fac) +void kvmppc_trigger_fac_interrupt(struct kvm_vcpu *vcpu, ulong fac) { /* Inject the Interrupt Cause field and trigger a guest interrupt */ vcpu->arch.fscr &= ~(0xffULL << 56); @@ -864,6 +1008,18 @@ static int kvmppc_handle_fac(struct kvm_vcpu *vcpu, ulong fac) break; } +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + /* Since we disabled MSR_TM at privilege state, the mfspr instruction + * for TM spr can trigger TM fac unavailable. In this case, the + * emulation is handled by kvmppc_emulate_fac(), which invokes + * kvmppc_emulate_mfspr() finally. But note the mfspr can include + * RT for NV registers. So it need to restore those NV reg to reflect + * the update. + */ + if ((fac == FSCR_TM_LG) && !(kvmppc_get_msr(vcpu) & MSR_PR)) + return RESUME_GUEST_NV; +#endif + return RESUME_GUEST; } @@ -872,7 +1028,12 @@ void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr) if ((vcpu->arch.fscr & FSCR_TAR) && !(fscr & FSCR_TAR)) { /* TAR got dropped, drop it in shadow too */ kvmppc_giveup_fac(vcpu, FSCR_TAR_LG); + } else if (!(vcpu->arch.fscr & FSCR_TAR) && (fscr & FSCR_TAR)) { + vcpu->arch.fscr = fscr; + kvmppc_handle_fac(vcpu, FSCR_TAR_LG); + return; } + vcpu->arch.fscr = fscr; } #endif @@ -1017,10 +1178,8 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), ~0xFFFUL); r = RESUME_GUEST; } else { - u64 msr = kvmppc_get_msr(vcpu); - msr |= shadow_srr1 & 0x58000000; - kvmppc_set_msr_fast(vcpu, msr); - kvmppc_book3s_queue_irqprio(vcpu, exit_nr); + kvmppc_core_queue_inst_storage(vcpu, + shadow_srr1 & 0x58000000); r = RESUME_GUEST; } break; @@ -1059,9 +1218,7 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr); srcu_read_unlock(&vcpu->kvm->srcu, idx); } else { - kvmppc_set_dar(vcpu, dar); - kvmppc_set_dsisr(vcpu, fault_dsisr); - kvmppc_book3s_queue_irqprio(vcpu, exit_nr); + kvmppc_core_queue_data_storage(vcpu, dar, fault_dsisr); r = RESUME_GUEST; } break; @@ -1092,10 +1249,13 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, case BOOK3S_INTERRUPT_EXTERNAL: case BOOK3S_INTERRUPT_EXTERNAL_LEVEL: case BOOK3S_INTERRUPT_EXTERNAL_HV: + case BOOK3S_INTERRUPT_H_VIRT: vcpu->stat.ext_intr_exits++; r = RESUME_GUEST; break; + case BOOK3S_INTERRUPT_HMI: case BOOK3S_INTERRUPT_PERFMON: + case BOOK3S_INTERRUPT_SYSTEM_RESET: r = RESUME_GUEST; break; case BOOK3S_INTERRUPT_PROGRAM: @@ -1225,8 +1385,7 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, } #ifdef CONFIG_PPC_BOOK3S_64 case BOOK3S_INTERRUPT_FAC_UNAVAIL: - kvmppc_handle_fac(vcpu, vcpu->arch.shadow_fscr >> 56); - r = RESUME_GUEST; + r = kvmppc_handle_fac(vcpu, vcpu->arch.shadow_fscr >> 56); break; #endif case BOOK3S_INTERRUPT_MACHINE_CHECK: @@ -1379,6 +1538,73 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, u64 id, else *val = get_reg_val(id, 0); break; +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + case KVM_REG_PPC_TFHAR: + *val = get_reg_val(id, vcpu->arch.tfhar); + break; + case KVM_REG_PPC_TFIAR: + *val = get_reg_val(id, vcpu->arch.tfiar); + break; + case KVM_REG_PPC_TEXASR: + *val = get_reg_val(id, vcpu->arch.texasr); + break; + case KVM_REG_PPC_TM_GPR0 ... KVM_REG_PPC_TM_GPR31: + *val = get_reg_val(id, + vcpu->arch.gpr_tm[id-KVM_REG_PPC_TM_GPR0]); + break; + case KVM_REG_PPC_TM_VSR0 ... KVM_REG_PPC_TM_VSR63: + { + int i, j; + + i = id - KVM_REG_PPC_TM_VSR0; + if (i < 32) + for (j = 0; j < TS_FPRWIDTH; j++) + val->vsxval[j] = vcpu->arch.fp_tm.fpr[i][j]; + else { + if (cpu_has_feature(CPU_FTR_ALTIVEC)) + val->vval = vcpu->arch.vr_tm.vr[i-32]; + else + r = -ENXIO; + } + break; + } + case KVM_REG_PPC_TM_CR: + *val = get_reg_val(id, vcpu->arch.cr_tm); + break; + case KVM_REG_PPC_TM_XER: + *val = get_reg_val(id, vcpu->arch.xer_tm); + break; + case KVM_REG_PPC_TM_LR: + *val = get_reg_val(id, vcpu->arch.lr_tm); + break; + case KVM_REG_PPC_TM_CTR: + *val = get_reg_val(id, vcpu->arch.ctr_tm); + break; + case KVM_REG_PPC_TM_FPSCR: + *val = get_reg_val(id, vcpu->arch.fp_tm.fpscr); + break; + case KVM_REG_PPC_TM_AMR: + *val = get_reg_val(id, vcpu->arch.amr_tm); + break; + case KVM_REG_PPC_TM_PPR: + *val = get_reg_val(id, vcpu->arch.ppr_tm); + break; + case KVM_REG_PPC_TM_VRSAVE: + *val = get_reg_val(id, vcpu->arch.vrsave_tm); + break; + case KVM_REG_PPC_TM_VSCR: + if (cpu_has_feature(CPU_FTR_ALTIVEC)) + *val = get_reg_val(id, vcpu->arch.vr_tm.vscr.u[3]); + else + r = -ENXIO; + break; + case KVM_REG_PPC_TM_DSCR: + *val = get_reg_val(id, vcpu->arch.dscr_tm); + break; + case KVM_REG_PPC_TM_TAR: + *val = get_reg_val(id, vcpu->arch.tar_tm); + break; +#endif default: r = -EINVAL; break; @@ -1412,6 +1638,72 @@ static int kvmppc_set_one_reg_pr(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_LPCR_64: kvmppc_set_lpcr_pr(vcpu, set_reg_val(id, *val)); break; +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + case KVM_REG_PPC_TFHAR: + vcpu->arch.tfhar = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TFIAR: + vcpu->arch.tfiar = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TEXASR: + vcpu->arch.texasr = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_GPR0 ... KVM_REG_PPC_TM_GPR31: + vcpu->arch.gpr_tm[id - KVM_REG_PPC_TM_GPR0] = + set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_VSR0 ... KVM_REG_PPC_TM_VSR63: + { + int i, j; + + i = id - KVM_REG_PPC_TM_VSR0; + if (i < 32) + for (j = 0; j < TS_FPRWIDTH; j++) + vcpu->arch.fp_tm.fpr[i][j] = val->vsxval[j]; + else + if (cpu_has_feature(CPU_FTR_ALTIVEC)) + vcpu->arch.vr_tm.vr[i-32] = val->vval; + else + r = -ENXIO; + break; + } + case KVM_REG_PPC_TM_CR: + vcpu->arch.cr_tm = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_XER: + vcpu->arch.xer_tm = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_LR: + vcpu->arch.lr_tm = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_CTR: + vcpu->arch.ctr_tm = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_FPSCR: + vcpu->arch.fp_tm.fpscr = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_AMR: + vcpu->arch.amr_tm = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_PPR: + vcpu->arch.ppr_tm = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_VRSAVE: + vcpu->arch.vrsave_tm = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_VSCR: + if (cpu_has_feature(CPU_FTR_ALTIVEC)) + vcpu->arch.vr.vscr.u[3] = set_reg_val(id, *val); + else + r = -ENXIO; + break; + case KVM_REG_PPC_TM_DSCR: + vcpu->arch.dscr_tm = set_reg_val(id, *val); + break; + case KVM_REG_PPC_TM_TAR: + vcpu->arch.tar_tm = set_reg_val(id, *val); + break; +#endif default: r = -EINVAL; break; @@ -1687,6 +1979,17 @@ static int kvm_vm_ioctl_get_smmu_info_pr(struct kvm *kvm, return 0; } + +static int kvm_configure_mmu_pr(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg) +{ + if (!cpu_has_feature(CPU_FTR_ARCH_300)) + return -ENODEV; + /* Require flags and process table base and size to all be zero. */ + if (cfg->flags || cfg->process_table) + return -EINVAL; + return 0; +} + #else static int kvm_vm_ioctl_get_smmu_info_pr(struct kvm *kvm, struct kvm_ppc_smmu_info *info) @@ -1735,9 +2038,12 @@ static void kvmppc_core_destroy_vm_pr(struct kvm *kvm) static int kvmppc_core_check_processor_compat_pr(void) { /* - * Disable KVM for Power9 untill the required bits merged. + * PR KVM can work on POWER9 inside a guest partition + * running in HPT mode. It can't work if we are using + * radix translation (because radix provides no way for + * a process to have unique translations in quadrant 3). */ - if (cpu_has_feature(CPU_FTR_ARCH_300)) + if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled()) return -EIO; return 0; } @@ -1781,7 +2087,9 @@ static struct kvmppc_ops kvm_ops_pr = { .arch_vm_ioctl = kvm_arch_vm_ioctl_pr, #ifdef CONFIG_PPC_BOOK3S_64 .hcall_implemented = kvmppc_hcall_impl_pr, + .configure_mmu = kvm_configure_mmu_pr, #endif + .giveup_ext = kvmppc_giveup_ext, }; diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index 93a180ceefad..98ccc7ec5d48 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -383,6 +383,19 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) */ PPC_LL r6, HSTATE_HOST_MSR(r13) +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + /* + * We don't want to change MSR[TS] bits via rfi here. + * The actual TM handling logic will be in host with + * recovered DR/IR bits after HSTATE_VMHANDLER. + * And MSR_TM can be enabled in HOST_MSR so rfid may + * not suppress this change and can lead to exception. + * Manually set MSR to prevent TS state change here. + */ + mfmsr r7 + rldicl r7, r7, 64 - MSR_TS_S_LG, 62 + rldimi r6, r7, MSR_TS_S_LG, 63 - MSR_TS_T_LG +#endif PPC_LL r8, HSTATE_VMHANDLER(r13) #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/kvm/book3s_xive_template.c b/arch/powerpc/kvm/book3s_xive_template.c index 99c3620b40d9..6e41ba7ec8f4 100644 --- a/arch/powerpc/kvm/book3s_xive_template.c +++ b/arch/powerpc/kvm/book3s_xive_template.c @@ -334,7 +334,7 @@ X_STATIC unsigned long GLUE(X_PFX,h_xirr)(struct kvm_vcpu *vcpu) */ /* Return interrupt and old CPPR in GPR4 */ - vcpu->arch.gpr[4] = hirq | (old_cppr << 24); + vcpu->arch.regs.gpr[4] = hirq | (old_cppr << 24); return H_SUCCESS; } @@ -369,7 +369,7 @@ X_STATIC unsigned long GLUE(X_PFX,h_ipoll)(struct kvm_vcpu *vcpu, unsigned long hirq = GLUE(X_PFX,scan_interrupts)(xc, pending, scan_poll); /* Return interrupt and old CPPR in GPR4 */ - vcpu->arch.gpr[4] = hirq | (xc->cppr << 24); + vcpu->arch.regs.gpr[4] = hirq | (xc->cppr << 24); return H_SUCCESS; } diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 876d4f294fdd..a9ca016da670 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -77,8 +77,10 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu) { int i; - printk("pc: %08lx msr: %08llx\n", vcpu->arch.pc, vcpu->arch.shared->msr); - printk("lr: %08lx ctr: %08lx\n", vcpu->arch.lr, vcpu->arch.ctr); + printk("pc: %08lx msr: %08llx\n", vcpu->arch.regs.nip, + vcpu->arch.shared->msr); + printk("lr: %08lx ctr: %08lx\n", vcpu->arch.regs.link, + vcpu->arch.regs.ctr); printk("srr0: %08llx srr1: %08llx\n", vcpu->arch.shared->srr0, vcpu->arch.shared->srr1); @@ -491,24 +493,25 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, if (allowed) { switch (int_class) { case INT_CLASS_NONCRIT: - set_guest_srr(vcpu, vcpu->arch.pc, + set_guest_srr(vcpu, vcpu->arch.regs.nip, vcpu->arch.shared->msr); break; case INT_CLASS_CRIT: - set_guest_csrr(vcpu, vcpu->arch.pc, + set_guest_csrr(vcpu, vcpu->arch.regs.nip, vcpu->arch.shared->msr); break; case INT_CLASS_DBG: - set_guest_dsrr(vcpu, vcpu->arch.pc, + set_guest_dsrr(vcpu, vcpu->arch.regs.nip, vcpu->arch.shared->msr); break; case INT_CLASS_MC: - set_guest_mcsrr(vcpu, vcpu->arch.pc, + set_guest_mcsrr(vcpu, vcpu->arch.regs.nip, vcpu->arch.shared->msr); break; } - vcpu->arch.pc = vcpu->arch.ivpr | vcpu->arch.ivor[priority]; + vcpu->arch.regs.nip = vcpu->arch.ivpr | + vcpu->arch.ivor[priority]; if (update_esr == true) kvmppc_set_esr(vcpu, vcpu->arch.queued_esr); if (update_dear == true) @@ -826,7 +829,7 @@ static int emulation_exit(struct kvm_run *run, struct kvm_vcpu *vcpu) case EMULATE_FAIL: printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n", - __func__, vcpu->arch.pc, vcpu->arch.last_inst); + __func__, vcpu->arch.regs.nip, vcpu->arch.last_inst); /* For debugging, encode the failing instruction and * report it to userspace. */ run->hw.hardware_exit_reason = ~0ULL << 32; @@ -875,7 +878,7 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct kvm_vcpu *vcpu) */ vcpu->arch.dbsr = 0; run->debug.arch.status = 0; - run->debug.arch.address = vcpu->arch.pc; + run->debug.arch.address = vcpu->arch.regs.nip; if (dbsr & (DBSR_IAC1 | DBSR_IAC2 | DBSR_IAC3 | DBSR_IAC4)) { run->debug.arch.status |= KVMPPC_DEBUG_BREAKPOINT; @@ -971,7 +974,7 @@ static int kvmppc_resume_inst_load(struct kvm_run *run, struct kvm_vcpu *vcpu, case EMULATE_FAIL: pr_debug("%s: load instruction from guest address %lx failed\n", - __func__, vcpu->arch.pc); + __func__, vcpu->arch.regs.nip); /* For debugging, encode the failing instruction and * report it to userspace. */ run->hw.hardware_exit_reason = ~0ULL << 32; @@ -1169,7 +1172,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, case BOOKE_INTERRUPT_SPE_FP_DATA: case BOOKE_INTERRUPT_SPE_FP_ROUND: printk(KERN_CRIT "%s: unexpected SPE interrupt %u at %08lx\n", - __func__, exit_nr, vcpu->arch.pc); + __func__, exit_nr, vcpu->arch.regs.nip); run->hw.hardware_exit_reason = exit_nr; r = RESUME_HOST; break; @@ -1299,7 +1302,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, } case BOOKE_INTERRUPT_ITLB_MISS: { - unsigned long eaddr = vcpu->arch.pc; + unsigned long eaddr = vcpu->arch.regs.nip; gpa_t gpaddr; gfn_t gfn; int gtlb_index; @@ -1391,7 +1394,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) int i; int r; - vcpu->arch.pc = 0; + vcpu->arch.regs.nip = 0; vcpu->arch.shared->pir = vcpu->vcpu_id; kvmppc_set_gpr(vcpu, 1, (16<<20) - 8); /* -8 for the callee-save LR slot */ kvmppc_set_msr(vcpu, 0); @@ -1440,10 +1443,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) vcpu_load(vcpu); - regs->pc = vcpu->arch.pc; + regs->pc = vcpu->arch.regs.nip; regs->cr = kvmppc_get_cr(vcpu); - regs->ctr = vcpu->arch.ctr; - regs->lr = vcpu->arch.lr; + regs->ctr = vcpu->arch.regs.ctr; + regs->lr = vcpu->arch.regs.link; regs->xer = kvmppc_get_xer(vcpu); regs->msr = vcpu->arch.shared->msr; regs->srr0 = kvmppc_get_srr0(vcpu); @@ -1471,10 +1474,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) vcpu_load(vcpu); - vcpu->arch.pc = regs->pc; + vcpu->arch.regs.nip = regs->pc; kvmppc_set_cr(vcpu, regs->cr); - vcpu->arch.ctr = regs->ctr; - vcpu->arch.lr = regs->lr; + vcpu->arch.regs.ctr = regs->ctr; + vcpu->arch.regs.link = regs->lr; kvmppc_set_xer(vcpu, regs->xer); kvmppc_set_msr(vcpu, regs->msr); kvmppc_set_srr0(vcpu, regs->srr0); diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index a82f64502de1..d23e582f0fee 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -34,19 +34,19 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu) { - vcpu->arch.pc = vcpu->arch.shared->srr0; + vcpu->arch.regs.nip = vcpu->arch.shared->srr0; kvmppc_set_msr(vcpu, vcpu->arch.shared->srr1); } static void kvmppc_emul_rfdi(struct kvm_vcpu *vcpu) { - vcpu->arch.pc = vcpu->arch.dsrr0; + vcpu->arch.regs.nip = vcpu->arch.dsrr0; kvmppc_set_msr(vcpu, vcpu->arch.dsrr1); } static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu) { - vcpu->arch.pc = vcpu->arch.csrr0; + vcpu->arch.regs.nip = vcpu->arch.csrr0; kvmppc_set_msr(vcpu, vcpu->arch.csrr1); } diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index 990db69a1d0b..3f8189eb56ed 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -53,7 +53,7 @@ static int dbell2prio(ulong param) static int kvmppc_e500_emul_msgclr(struct kvm_vcpu *vcpu, int rb) { - ulong param = vcpu->arch.gpr[rb]; + ulong param = vcpu->arch.regs.gpr[rb]; int prio = dbell2prio(param); if (prio < 0) @@ -65,7 +65,7 @@ static int kvmppc_e500_emul_msgclr(struct kvm_vcpu *vcpu, int rb) static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu, int rb) { - ulong param = vcpu->arch.gpr[rb]; + ulong param = vcpu->arch.regs.gpr[rb]; int prio = dbell2prio(rb); int pir = param & PPC_DBELL_PIR_MASK; int i; @@ -94,7 +94,7 @@ static int kvmppc_e500_emul_ehpriv(struct kvm_run *run, struct kvm_vcpu *vcpu, switch (get_oc(inst)) { case EHPRIV_OC_DEBUG: run->exit_reason = KVM_EXIT_DEBUG; - run->debug.arch.address = vcpu->arch.pc; + run->debug.arch.address = vcpu->arch.regs.nip; run->debug.arch.status = 0; kvmppc_account_exit(vcpu, DEBUG_EXITS); emulated = EMULATE_EXIT_USER; diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c index ddbf8f0284c0..24296f4cadc6 100644 --- a/arch/powerpc/kvm/e500_mmu.c +++ b/arch/powerpc/kvm/e500_mmu.c @@ -513,7 +513,7 @@ void kvmppc_mmu_itlb_miss(struct kvm_vcpu *vcpu) { unsigned int as = !!(vcpu->arch.shared->msr & MSR_IS); - kvmppc_e500_deliver_tlb_miss(vcpu, vcpu->arch.pc, as); + kvmppc_e500_deliver_tlb_miss(vcpu, vcpu->arch.regs.nip, as); } void kvmppc_mmu_dtlb_miss(struct kvm_vcpu *vcpu) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index c878b4ffb86f..8f2985e46f6f 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -625,8 +625,8 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr, } #ifdef CONFIG_KVM_BOOKE_HV -int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, enum instruction_type type, - u32 *instr) +int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, + enum instruction_fetch_type type, u32 *instr) { gva_t geaddr; hpa_t addr; @@ -715,8 +715,8 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, enum instruction_type type, return EMULATE_DONE; } #else -int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, enum instruction_type type, - u32 *instr) +int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, + enum instruction_fetch_type type, u32 *instr) { return EMULATE_AGAIN; } diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c index a382e15135e6..afde788be141 100644 --- a/arch/powerpc/kvm/emulate_loadstore.c +++ b/arch/powerpc/kvm/emulate_loadstore.c @@ -31,6 +31,7 @@ #include <asm/kvm_ppc.h> #include <asm/disassemble.h> #include <asm/ppc-opcode.h> +#include <asm/sstep.h> #include "timing.h" #include "trace.h" @@ -84,8 +85,9 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) struct kvm_run *run = vcpu->run; u32 inst; int ra, rs, rt; - enum emulation_result emulated; + enum emulation_result emulated = EMULATE_FAIL; int advance = 1; + struct instruction_op op; /* this default type might be overwritten by subcategories */ kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS); @@ -107,580 +109,276 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu) vcpu->arch.mmio_vsx_tx_sx_enabled = get_tx_or_sx(inst); vcpu->arch.mmio_vsx_copy_nums = 0; vcpu->arch.mmio_vsx_offset = 0; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_NONE; + vcpu->arch.mmio_copy_type = KVMPPC_VSX_COPY_NONE; vcpu->arch.mmio_sp64_extend = 0; vcpu->arch.mmio_sign_extend = 0; vcpu->arch.mmio_vmx_copy_nums = 0; + vcpu->arch.mmio_vmx_offset = 0; + vcpu->arch.mmio_host_swabbed = 0; - switch (get_op(inst)) { - case 31: - switch (get_xop(inst)) { - case OP_31_XOP_LWZX: - emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1); - break; - - case OP_31_XOP_LWZUX: - emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; + emulated = EMULATE_FAIL; + vcpu->arch.regs.msr = vcpu->arch.shared->msr; + vcpu->arch.regs.ccr = vcpu->arch.cr; + if (analyse_instr(&op, &vcpu->arch.regs, inst) == 0) { + int type = op.type & INSTR_TYPE_MASK; + int size = GETSIZE(op.type); - case OP_31_XOP_LBZX: - emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); - break; - - case OP_31_XOP_LBZUX: - emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; + switch (type) { + case LOAD: { + int instr_byte_swap = op.type & BYTEREV; - case OP_31_XOP_STDX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 8, 1); - break; + if (op.type & SIGNEXT) + emulated = kvmppc_handle_loads(run, vcpu, + op.reg, size, !instr_byte_swap); + else + emulated = kvmppc_handle_load(run, vcpu, + op.reg, size, !instr_byte_swap); - case OP_31_XOP_STDUX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 8, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; + if ((op.type & UPDATE) && (emulated != EMULATE_FAIL)) + kvmppc_set_gpr(vcpu, op.update_reg, op.ea); - case OP_31_XOP_STWX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 4, 1); - break; - - case OP_31_XOP_STWUX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 4, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_31_XOP_STBX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 1, 1); - break; - - case OP_31_XOP_STBUX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 1, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_31_XOP_LHAX: - emulated = kvmppc_handle_loads(run, vcpu, rt, 2, 1); - break; - - case OP_31_XOP_LHAUX: - emulated = kvmppc_handle_loads(run, vcpu, rt, 2, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_31_XOP_LHZX: - emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1); - break; - - case OP_31_XOP_LHZUX: - emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_31_XOP_STHX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 2, 1); - break; - - case OP_31_XOP_STHUX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 2, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_31_XOP_DCBST: - case OP_31_XOP_DCBF: - case OP_31_XOP_DCBI: - /* Do nothing. The guest is performing dcbi because - * hardware DMA is not snooped by the dcache, but - * emulated DMA either goes through the dcache as - * normal writes, or the host kernel has handled dcache - * coherence. */ - break; - - case OP_31_XOP_LWBRX: - emulated = kvmppc_handle_load(run, vcpu, rt, 4, 0); - break; - - case OP_31_XOP_STWBRX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 4, 0); break; - - case OP_31_XOP_LHBRX: - emulated = kvmppc_handle_load(run, vcpu, rt, 2, 0); - break; - - case OP_31_XOP_STHBRX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 2, 0); - break; - - case OP_31_XOP_LDBRX: - emulated = kvmppc_handle_load(run, vcpu, rt, 8, 0); - break; - - case OP_31_XOP_STDBRX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 8, 0); - break; - - case OP_31_XOP_LDX: - emulated = kvmppc_handle_load(run, vcpu, rt, 8, 1); - break; - - case OP_31_XOP_LDUX: - emulated = kvmppc_handle_load(run, vcpu, rt, 8, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_31_XOP_LWAX: - emulated = kvmppc_handle_loads(run, vcpu, rt, 4, 1); - break; - - case OP_31_XOP_LWAUX: - emulated = kvmppc_handle_loads(run, vcpu, rt, 4, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - + } #ifdef CONFIG_PPC_FPU - case OP_31_XOP_LFSX: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_load(run, vcpu, - KVM_MMIO_REG_FPR|rt, 4, 1); - break; - - case OP_31_XOP_LFSUX: + case LOAD_FP: if (kvmppc_check_fp_disabled(vcpu)) return EMULATE_DONE; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_load(run, vcpu, - KVM_MMIO_REG_FPR|rt, 4, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - case OP_31_XOP_LFDX: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_load(run, vcpu, - KVM_MMIO_REG_FPR|rt, 8, 1); - break; + if (op.type & FPCONV) + vcpu->arch.mmio_sp64_extend = 1; - case OP_31_XOP_LFDUX: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_load(run, vcpu, - KVM_MMIO_REG_FPR|rt, 8, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_31_XOP_LFIWAX: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_loads(run, vcpu, - KVM_MMIO_REG_FPR|rt, 4, 1); - break; + if (op.type & SIGNEXT) + emulated = kvmppc_handle_loads(run, vcpu, + KVM_MMIO_REG_FPR|op.reg, size, 1); + else + emulated = kvmppc_handle_load(run, vcpu, + KVM_MMIO_REG_FPR|op.reg, size, 1); - case OP_31_XOP_LFIWZX: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_load(run, vcpu, - KVM_MMIO_REG_FPR|rt, 4, 1); - break; + if ((op.type & UPDATE) && (emulated != EMULATE_FAIL)) + kvmppc_set_gpr(vcpu, op.update_reg, op.ea); - case OP_31_XOP_STFSX: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_store(run, vcpu, - VCPU_FPR(vcpu, rs), 4, 1); break; - - case OP_31_XOP_STFSUX: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_store(run, vcpu, - VCPU_FPR(vcpu, rs), 4, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_31_XOP_STFDX: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_store(run, vcpu, - VCPU_FPR(vcpu, rs), 8, 1); - break; - - case OP_31_XOP_STFDUX: - if (kvmppc_check_fp_disabled(vcpu)) +#endif +#ifdef CONFIG_ALTIVEC + case LOAD_VMX: + if (kvmppc_check_altivec_disabled(vcpu)) return EMULATE_DONE; - emulated = kvmppc_handle_store(run, vcpu, - VCPU_FPR(vcpu, rs), 8, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - case OP_31_XOP_STFIWX: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_store(run, vcpu, - VCPU_FPR(vcpu, rs), 4, 1); + /* Hardware enforces alignment of VMX accesses */ + vcpu->arch.vaddr_accessed &= ~((unsigned long)size - 1); + vcpu->arch.paddr_accessed &= ~((unsigned long)size - 1); + + if (size == 16) { /* lvx */ + vcpu->arch.mmio_copy_type = + KVMPPC_VMX_COPY_DWORD; + } else if (size == 4) { /* lvewx */ + vcpu->arch.mmio_copy_type = + KVMPPC_VMX_COPY_WORD; + } else if (size == 2) { /* lvehx */ + vcpu->arch.mmio_copy_type = + KVMPPC_VMX_COPY_HWORD; + } else if (size == 1) { /* lvebx */ + vcpu->arch.mmio_copy_type = + KVMPPC_VMX_COPY_BYTE; + } else + break; + + vcpu->arch.mmio_vmx_offset = + (vcpu->arch.vaddr_accessed & 0xf)/size; + + if (size == 16) { + vcpu->arch.mmio_vmx_copy_nums = 2; + emulated = kvmppc_handle_vmx_load(run, + vcpu, KVM_MMIO_REG_VMX|op.reg, + 8, 1); + } else { + vcpu->arch.mmio_vmx_copy_nums = 1; + emulated = kvmppc_handle_vmx_load(run, vcpu, + KVM_MMIO_REG_VMX|op.reg, + size, 1); + } break; #endif - #ifdef CONFIG_VSX - case OP_31_XOP_LXSDX: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 1; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_DWORD; - emulated = kvmppc_handle_vsx_load(run, vcpu, - KVM_MMIO_REG_VSX|rt, 8, 1, 0); - break; - - case OP_31_XOP_LXSSPX: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 1; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_DWORD; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_vsx_load(run, vcpu, - KVM_MMIO_REG_VSX|rt, 4, 1, 0); - break; + case LOAD_VSX: { + int io_size_each; + + if (op.vsx_flags & VSX_CHECK_VEC) { + if (kvmppc_check_altivec_disabled(vcpu)) + return EMULATE_DONE; + } else { + if (kvmppc_check_vsx_disabled(vcpu)) + return EMULATE_DONE; + } + + if (op.vsx_flags & VSX_FPCONV) + vcpu->arch.mmio_sp64_extend = 1; + + if (op.element_size == 8) { + if (op.vsx_flags & VSX_SPLAT) + vcpu->arch.mmio_copy_type = + KVMPPC_VSX_COPY_DWORD_LOAD_DUMP; + else + vcpu->arch.mmio_copy_type = + KVMPPC_VSX_COPY_DWORD; + } else if (op.element_size == 4) { + if (op.vsx_flags & VSX_SPLAT) + vcpu->arch.mmio_copy_type = + KVMPPC_VSX_COPY_WORD_LOAD_DUMP; + else + vcpu->arch.mmio_copy_type = + KVMPPC_VSX_COPY_WORD; + } else + break; + + if (size < op.element_size) { + /* precision convert case: lxsspx, etc */ + vcpu->arch.mmio_vsx_copy_nums = 1; + io_size_each = size; + } else { /* lxvw4x, lxvd2x, etc */ + vcpu->arch.mmio_vsx_copy_nums = + size/op.element_size; + io_size_each = op.element_size; + } - case OP_31_XOP_LXSIWAX: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 1; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_DWORD; emulated = kvmppc_handle_vsx_load(run, vcpu, - KVM_MMIO_REG_VSX|rt, 4, 1, 1); - break; - - case OP_31_XOP_LXSIWZX: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 1; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_DWORD; - emulated = kvmppc_handle_vsx_load(run, vcpu, - KVM_MMIO_REG_VSX|rt, 4, 1, 0); + KVM_MMIO_REG_VSX | (op.reg & 0x1f), + io_size_each, 1, op.type & SIGNEXT); break; + } +#endif + case STORE: + /* if need byte reverse, op.val has been reversed by + * analyse_instr(). + */ + emulated = kvmppc_handle_store(run, vcpu, op.val, + size, 1); - case OP_31_XOP_LXVD2X: - /* - * In this case, the official load/store process is like this: - * Step1, exit from vm by page fault isr, then kvm save vsr. - * Please see guest_exit_cont->store_fp_state->SAVE_32VSRS - * as reference. - * - * Step2, copy data between memory and VCPU - * Notice: for LXVD2X/STXVD2X/LXVW4X/STXVW4X, we use - * 2copies*8bytes or 4copies*4bytes - * to simulate one copy of 16bytes. - * Also there is an endian issue here, we should notice the - * layout of memory. - * Please see MARCO of LXVD2X_ROT/STXVD2X_ROT as more reference. - * If host is little-endian, kvm will call XXSWAPD for - * LXVD2X_ROT/STXVD2X_ROT. - * So, if host is little-endian, - * the postion of memeory should be swapped. - * - * Step3, return to guest, kvm reset register. - * Please see kvmppc_hv_entry->load_fp_state->REST_32VSRS - * as reference. - */ - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 2; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_DWORD; - emulated = kvmppc_handle_vsx_load(run, vcpu, - KVM_MMIO_REG_VSX|rt, 8, 1, 0); - break; + if ((op.type & UPDATE) && (emulated != EMULATE_FAIL)) + kvmppc_set_gpr(vcpu, op.update_reg, op.ea); - case OP_31_XOP_LXVW4X: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 4; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_WORD; - emulated = kvmppc_handle_vsx_load(run, vcpu, - KVM_MMIO_REG_VSX|rt, 4, 1, 0); break; - - case OP_31_XOP_LXVDSX: - if (kvmppc_check_vsx_disabled(vcpu)) +#ifdef CONFIG_PPC_FPU + case STORE_FP: + if (kvmppc_check_fp_disabled(vcpu)) return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 1; - vcpu->arch.mmio_vsx_copy_type = - KVMPPC_VSX_COPY_DWORD_LOAD_DUMP; - emulated = kvmppc_handle_vsx_load(run, vcpu, - KVM_MMIO_REG_VSX|rt, 8, 1, 0); - break; - case OP_31_XOP_STXSDX: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 1; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_DWORD; - emulated = kvmppc_handle_vsx_store(run, vcpu, - rs, 8, 1); - break; + /* The FP registers need to be flushed so that + * kvmppc_handle_store() can read actual FP vals + * from vcpu->arch. + */ + if (vcpu->kvm->arch.kvm_ops->giveup_ext) + vcpu->kvm->arch.kvm_ops->giveup_ext(vcpu, + MSR_FP); - case OP_31_XOP_STXSSPX: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 1; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_DWORD; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_vsx_store(run, vcpu, - rs, 4, 1); - break; + if (op.type & FPCONV) + vcpu->arch.mmio_sp64_extend = 1; - case OP_31_XOP_STXSIWX: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_offset = 1; - vcpu->arch.mmio_vsx_copy_nums = 1; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_WORD; - emulated = kvmppc_handle_vsx_store(run, vcpu, - rs, 4, 1); - break; + emulated = kvmppc_handle_store(run, vcpu, + VCPU_FPR(vcpu, op.reg), size, 1); - case OP_31_XOP_STXVD2X: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 2; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_DWORD; - emulated = kvmppc_handle_vsx_store(run, vcpu, - rs, 8, 1); - break; + if ((op.type & UPDATE) && (emulated != EMULATE_FAIL)) + kvmppc_set_gpr(vcpu, op.update_reg, op.ea); - case OP_31_XOP_STXVW4X: - if (kvmppc_check_vsx_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_vsx_copy_nums = 4; - vcpu->arch.mmio_vsx_copy_type = KVMPPC_VSX_COPY_WORD; - emulated = kvmppc_handle_vsx_store(run, vcpu, - rs, 4, 1); break; -#endif /* CONFIG_VSX */ - +#endif #ifdef CONFIG_ALTIVEC - case OP_31_XOP_LVX: + case STORE_VMX: if (kvmppc_check_altivec_disabled(vcpu)) return EMULATE_DONE; - vcpu->arch.vaddr_accessed &= ~0xFULL; - vcpu->arch.paddr_accessed &= ~0xFULL; - vcpu->arch.mmio_vmx_copy_nums = 2; - emulated = kvmppc_handle_load128_by2x64(run, vcpu, - KVM_MMIO_REG_VMX|rt, 1); - break; - case OP_31_XOP_STVX: - if (kvmppc_check_altivec_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.vaddr_accessed &= ~0xFULL; - vcpu->arch.paddr_accessed &= ~0xFULL; - vcpu->arch.mmio_vmx_copy_nums = 2; - emulated = kvmppc_handle_store128_by2x64(run, vcpu, - rs, 1); - break; -#endif /* CONFIG_ALTIVEC */ + /* Hardware enforces alignment of VMX accesses. */ + vcpu->arch.vaddr_accessed &= ~((unsigned long)size - 1); + vcpu->arch.paddr_accessed &= ~((unsigned long)size - 1); + + if (vcpu->kvm->arch.kvm_ops->giveup_ext) + vcpu->kvm->arch.kvm_ops->giveup_ext(vcpu, + MSR_VEC); + if (size == 16) { /* stvx */ + vcpu->arch.mmio_copy_type = + KVMPPC_VMX_COPY_DWORD; + } else if (size == 4) { /* stvewx */ + vcpu->arch.mmio_copy_type = + KVMPPC_VMX_COPY_WORD; + } else if (size == 2) { /* stvehx */ + vcpu->arch.mmio_copy_type = + KVMPPC_VMX_COPY_HWORD; + } else if (size == 1) { /* stvebx */ + vcpu->arch.mmio_copy_type = + KVMPPC_VMX_COPY_BYTE; + } else + break; + + vcpu->arch.mmio_vmx_offset = + (vcpu->arch.vaddr_accessed & 0xf)/size; + + if (size == 16) { + vcpu->arch.mmio_vmx_copy_nums = 2; + emulated = kvmppc_handle_vmx_store(run, + vcpu, op.reg, 8, 1); + } else { + vcpu->arch.mmio_vmx_copy_nums = 1; + emulated = kvmppc_handle_vmx_store(run, + vcpu, op.reg, size, 1); + } - default: - emulated = EMULATE_FAIL; break; - } - break; - - case OP_LWZ: - emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1); - break; - -#ifdef CONFIG_PPC_FPU - case OP_STFS: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_store(run, vcpu, - VCPU_FPR(vcpu, rs), - 4, 1); - break; - - case OP_STFSU: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_store(run, vcpu, - VCPU_FPR(vcpu, rs), - 4, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_STFD: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_store(run, vcpu, - VCPU_FPR(vcpu, rs), - 8, 1); - break; - - case OP_STFDU: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_store(run, vcpu, - VCPU_FPR(vcpu, rs), - 8, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; #endif +#ifdef CONFIG_VSX + case STORE_VSX: { + int io_size_each; + + if (op.vsx_flags & VSX_CHECK_VEC) { + if (kvmppc_check_altivec_disabled(vcpu)) + return EMULATE_DONE; + } else { + if (kvmppc_check_vsx_disabled(vcpu)) + return EMULATE_DONE; + } + + if (vcpu->kvm->arch.kvm_ops->giveup_ext) + vcpu->kvm->arch.kvm_ops->giveup_ext(vcpu, + MSR_VSX); + + if (op.vsx_flags & VSX_FPCONV) + vcpu->arch.mmio_sp64_extend = 1; + + if (op.element_size == 8) + vcpu->arch.mmio_copy_type = + KVMPPC_VSX_COPY_DWORD; + else if (op.element_size == 4) + vcpu->arch.mmio_copy_type = + KVMPPC_VSX_COPY_WORD; + else + break; + + if (size < op.element_size) { + /* precise conversion case, like stxsspx */ + vcpu->arch.mmio_vsx_copy_nums = 1; + io_size_each = size; + } else { /* stxvw4x, stxvd2x, etc */ + vcpu->arch.mmio_vsx_copy_nums = + size/op.element_size; + io_size_each = op.element_size; + } - case OP_LD: - rt = get_rt(inst); - switch (inst & 3) { - case 0: /* ld */ - emulated = kvmppc_handle_load(run, vcpu, rt, 8, 1); - break; - case 1: /* ldu */ - emulated = kvmppc_handle_load(run, vcpu, rt, 8, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - case 2: /* lwa */ - emulated = kvmppc_handle_loads(run, vcpu, rt, 4, 1); + emulated = kvmppc_handle_vsx_store(run, vcpu, + op.reg & 0x1f, io_size_each, 1); break; - default: - emulated = EMULATE_FAIL; } - break; - - case OP_LWZU: - emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_LBZ: - emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); - break; - - case OP_LBZU: - emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_STW: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), - 4, 1); - break; - - case OP_STD: - rs = get_rs(inst); - switch (inst & 3) { - case 0: /* std */ - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 8, 1); - break; - case 1: /* stdu */ - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 8, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); +#endif + case CACHEOP: + /* Do nothing. The guest is performing dcbi because + * hardware DMA is not snooped by the dcache, but + * emulated DMA either goes through the dcache as + * normal writes, or the host kernel has handled dcache + * coherence. + */ + emulated = EMULATE_DONE; break; default: - emulated = EMULATE_FAIL; + break; } - break; - - case OP_STWU: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 4, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_STB: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 1, 1); - break; - - case OP_STBU: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 1, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_LHZ: - emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1); - break; - - case OP_LHZU: - emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_LHA: - emulated = kvmppc_handle_loads(run, vcpu, rt, 2, 1); - break; - - case OP_LHAU: - emulated = kvmppc_handle_loads(run, vcpu, rt, 2, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_STH: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 2, 1); - break; - - case OP_STHU: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), 2, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - -#ifdef CONFIG_PPC_FPU - case OP_LFS: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_load(run, vcpu, - KVM_MMIO_REG_FPR|rt, 4, 1); - break; - - case OP_LFSU: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - vcpu->arch.mmio_sp64_extend = 1; - emulated = kvmppc_handle_load(run, vcpu, - KVM_MMIO_REG_FPR|rt, 4, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_LFD: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_load(run, vcpu, - KVM_MMIO_REG_FPR|rt, 8, 1); - break; - - case OP_LFDU: - if (kvmppc_check_fp_disabled(vcpu)) - return EMULATE_DONE; - emulated = kvmppc_handle_load(run, vcpu, - KVM_MMIO_REG_FPR|rt, 8, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; -#endif - - default: - emulated = EMULATE_FAIL; - break; } if (emulated == EMULATE_FAIL) { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 3764d000872e..0e8c20c5eaac 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -648,9 +648,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) #endif #ifdef CONFIG_PPC_TRANSACTIONAL_MEM case KVM_CAP_PPC_HTM: - r = hv_enabled && - (!!(cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM) || - cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)); + r = !!(cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM) || + (hv_enabled && cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)); break; #endif default: @@ -907,6 +906,26 @@ static inline void kvmppc_set_vsr_dword_dump(struct kvm_vcpu *vcpu, } } +static inline void kvmppc_set_vsr_word_dump(struct kvm_vcpu *vcpu, + u32 gpr) +{ + union kvmppc_one_reg val; + int index = vcpu->arch.io_gpr & KVM_MMIO_REG_MASK; + + if (vcpu->arch.mmio_vsx_tx_sx_enabled) { + val.vsx32val[0] = gpr; + val.vsx32val[1] = gpr; + val.vsx32val[2] = gpr; + val.vsx32val[3] = gpr; + VCPU_VSX_VR(vcpu, index) = val.vval; + } else { + val.vsx32val[0] = gpr; + val.vsx32val[1] = gpr; + VCPU_VSX_FPR(vcpu, index, 0) = val.vsxval[0]; + VCPU_VSX_FPR(vcpu, index, 1) = val.vsxval[0]; + } +} + static inline void kvmppc_set_vsr_word(struct kvm_vcpu *vcpu, u32 gpr32) { @@ -933,30 +952,110 @@ static inline void kvmppc_set_vsr_word(struct kvm_vcpu *vcpu, #endif /* CONFIG_VSX */ #ifdef CONFIG_ALTIVEC +static inline int kvmppc_get_vmx_offset_generic(struct kvm_vcpu *vcpu, + int index, int element_size) +{ + int offset; + int elts = sizeof(vector128)/element_size; + + if ((index < 0) || (index >= elts)) + return -1; + + if (kvmppc_need_byteswap(vcpu)) + offset = elts - index - 1; + else + offset = index; + + return offset; +} + +static inline int kvmppc_get_vmx_dword_offset(struct kvm_vcpu *vcpu, + int index) +{ + return kvmppc_get_vmx_offset_generic(vcpu, index, 8); +} + +static inline int kvmppc_get_vmx_word_offset(struct kvm_vcpu *vcpu, + int index) +{ + return kvmppc_get_vmx_offset_generic(vcpu, index, 4); +} + +static inline int kvmppc_get_vmx_hword_offset(struct kvm_vcpu *vcpu, + int index) +{ + return kvmppc_get_vmx_offset_generic(vcpu, index, 2); +} + +static inline int kvmppc_get_vmx_byte_offset(struct kvm_vcpu *vcpu, + int index) +{ + return kvmppc_get_vmx_offset_generic(vcpu, index, 1); +} + + static inline void kvmppc_set_vmx_dword(struct kvm_vcpu *vcpu, - u64 gpr) + u64 gpr) { + union kvmppc_one_reg val; + int offset = kvmppc_get_vmx_dword_offset(vcpu, + vcpu->arch.mmio_vmx_offset); int index = vcpu->arch.io_gpr & KVM_MMIO_REG_MASK; - u32 hi, lo; - u32 di; -#ifdef __BIG_ENDIAN - hi = gpr >> 32; - lo = gpr & 0xffffffff; -#else - lo = gpr >> 32; - hi = gpr & 0xffffffff; -#endif + if (offset == -1) + return; + + val.vval = VCPU_VSX_VR(vcpu, index); + val.vsxval[offset] = gpr; + VCPU_VSX_VR(vcpu, index) = val.vval; +} + +static inline void kvmppc_set_vmx_word(struct kvm_vcpu *vcpu, + u32 gpr32) +{ + union kvmppc_one_reg val; + int offset = kvmppc_get_vmx_word_offset(vcpu, + vcpu->arch.mmio_vmx_offset); + int index = vcpu->arch.io_gpr & KVM_MMIO_REG_MASK; - di = 2 - vcpu->arch.mmio_vmx_copy_nums; /* doubleword index */ - if (di > 1) + if (offset == -1) return; - if (vcpu->arch.mmio_host_swabbed) - di = 1 - di; + val.vval = VCPU_VSX_VR(vcpu, index); + val.vsx32val[offset] = gpr32; + VCPU_VSX_VR(vcpu, index) = val.vval; +} + +static inline void kvmppc_set_vmx_hword(struct kvm_vcpu *vcpu, + u16 gpr16) +{ + union kvmppc_one_reg val; + int offset = kvmppc_get_vmx_hword_offset(vcpu, + vcpu->arch.mmio_vmx_offset); + int index = vcpu->arch.io_gpr & KVM_MMIO_REG_MASK; + + if (offset == -1) + return; + + val.vval = VCPU_VSX_VR(vcpu, index); + val.vsx16val[offset] = gpr16; + VCPU_VSX_VR(vcpu, index) = val.vval; +} + +static inline void kvmppc_set_vmx_byte(struct kvm_vcpu *vcpu, + u8 gpr8) +{ + union kvmppc_one_reg val; + int offset = kvmppc_get_vmx_byte_offset(vcpu, + vcpu->arch.mmio_vmx_offset); + int index = vcpu->arch.io_gpr & KVM_MMIO_REG_MASK; + + if (offset == -1) + return; - VCPU_VSX_VR(vcpu, index).u[di * 2] = hi; - VCPU_VSX_VR(vcpu, index).u[di * 2 + 1] = lo; + val.vval = VCPU_VSX_VR(vcpu, index); + val.vsx8val[offset] = gpr8; + VCPU_VSX_VR(vcpu, index) = val.vval; } #endif /* CONFIG_ALTIVEC */ @@ -1041,6 +1140,9 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu, kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, gpr); break; case KVM_MMIO_REG_FPR: + if (vcpu->kvm->arch.kvm_ops->giveup_ext) + vcpu->kvm->arch.kvm_ops->giveup_ext(vcpu, MSR_FP); + VCPU_FPR(vcpu, vcpu->arch.io_gpr & KVM_MMIO_REG_MASK) = gpr; break; #ifdef CONFIG_PPC_BOOK3S @@ -1054,18 +1156,36 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu, #endif #ifdef CONFIG_VSX case KVM_MMIO_REG_VSX: - if (vcpu->arch.mmio_vsx_copy_type == KVMPPC_VSX_COPY_DWORD) + if (vcpu->kvm->arch.kvm_ops->giveup_ext) + vcpu->kvm->arch.kvm_ops->giveup_ext(vcpu, MSR_VSX); + + if (vcpu->arch.mmio_copy_type == KVMPPC_VSX_COPY_DWORD) kvmppc_set_vsr_dword(vcpu, gpr); - else if (vcpu->arch.mmio_vsx_copy_type == KVMPPC_VSX_COPY_WORD) + else if (vcpu->arch.mmio_copy_type == KVMPPC_VSX_COPY_WORD) kvmppc_set_vsr_word(vcpu, gpr); - else if (vcpu->arch.mmio_vsx_copy_type == + else if (vcpu->arch.mmio_copy_type == KVMPPC_VSX_COPY_DWORD_LOAD_DUMP) kvmppc_set_vsr_dword_dump(vcpu, gpr); + else if (vcpu->arch.mmio_copy_type == + KVMPPC_VSX_COPY_WORD_LOAD_DUMP) + kvmppc_set_vsr_word_dump(vcpu, gpr); break; #endif #ifdef CONFIG_ALTIVEC case KVM_MMIO_REG_VMX: - kvmppc_set_vmx_dword(vcpu, gpr); + if (vcpu->kvm->arch.kvm_ops->giveup_ext) + vcpu->kvm->arch.kvm_ops->giveup_ext(vcpu, MSR_VEC); + + if (vcpu->arch.mmio_copy_type == KVMPPC_VMX_COPY_DWORD) + kvmppc_set_vmx_dword(vcpu, gpr); + else if (vcpu->arch.mmio_copy_type == KVMPPC_VMX_COPY_WORD) + kvmppc_set_vmx_word(vcpu, gpr); + else if (vcpu->arch.mmio_copy_type == + KVMPPC_VMX_COPY_HWORD) + kvmppc_set_vmx_hword(vcpu, gpr); + else if (vcpu->arch.mmio_copy_type == + KVMPPC_VMX_COPY_BYTE) + kvmppc_set_vmx_byte(vcpu, gpr); break; #endif default: @@ -1228,7 +1348,7 @@ static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val) u32 dword_offset, word_offset; union kvmppc_one_reg reg; int vsx_offset = 0; - int copy_type = vcpu->arch.mmio_vsx_copy_type; + int copy_type = vcpu->arch.mmio_copy_type; int result = 0; switch (copy_type) { @@ -1344,14 +1464,16 @@ static int kvmppc_emulate_mmio_vsx_loadstore(struct kvm_vcpu *vcpu, #endif /* CONFIG_VSX */ #ifdef CONFIG_ALTIVEC -/* handle quadword load access in two halves */ -int kvmppc_handle_load128_by2x64(struct kvm_run *run, struct kvm_vcpu *vcpu, - unsigned int rt, int is_default_endian) +int kvmppc_handle_vmx_load(struct kvm_run *run, struct kvm_vcpu *vcpu, + unsigned int rt, unsigned int bytes, int is_default_endian) { enum emulation_result emulated = EMULATE_DONE; + if (vcpu->arch.mmio_vsx_copy_nums > 2) + return EMULATE_FAIL; + while (vcpu->arch.mmio_vmx_copy_nums) { - emulated = __kvmppc_handle_load(run, vcpu, rt, 8, + emulated = __kvmppc_handle_load(run, vcpu, rt, bytes, is_default_endian, 0); if (emulated != EMULATE_DONE) @@ -1359,55 +1481,127 @@ int kvmppc_handle_load128_by2x64(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu->arch.paddr_accessed += run->mmio.len; vcpu->arch.mmio_vmx_copy_nums--; + vcpu->arch.mmio_vmx_offset++; } return emulated; } -static inline int kvmppc_get_vmx_data(struct kvm_vcpu *vcpu, int rs, u64 *val) +int kvmppc_get_vmx_dword(struct kvm_vcpu *vcpu, int index, u64 *val) { - vector128 vrs = VCPU_VSX_VR(vcpu, rs); - u32 di; - u64 w0, w1; + union kvmppc_one_reg reg; + int vmx_offset = 0; + int result = 0; - di = 2 - vcpu->arch.mmio_vmx_copy_nums; /* doubleword index */ - if (di > 1) + vmx_offset = + kvmppc_get_vmx_dword_offset(vcpu, vcpu->arch.mmio_vmx_offset); + + if (vmx_offset == -1) return -1; - if (vcpu->arch.mmio_host_swabbed) - di = 1 - di; + reg.vval = VCPU_VSX_VR(vcpu, index); + *val = reg.vsxval[vmx_offset]; - w0 = vrs.u[di * 2]; - w1 = vrs.u[di * 2 + 1]; + return result; +} -#ifdef __BIG_ENDIAN - *val = (w0 << 32) | w1; -#else - *val = (w1 << 32) | w0; -#endif - return 0; +int kvmppc_get_vmx_word(struct kvm_vcpu *vcpu, int index, u64 *val) +{ + union kvmppc_one_reg reg; + int vmx_offset = 0; + int result = 0; + + vmx_offset = + kvmppc_get_vmx_word_offset(vcpu, vcpu->arch.mmio_vmx_offset); + + if (vmx_offset == -1) + return -1; + + reg.vval = VCPU_VSX_VR(vcpu, index); + *val = reg.vsx32val[vmx_offset]; + + return result; +} + +int kvmppc_get_vmx_hword(struct kvm_vcpu *vcpu, int index, u64 *val) +{ + union kvmppc_one_reg reg; + int vmx_offset = 0; + int result = 0; + + vmx_offset = + kvmppc_get_vmx_hword_offset(vcpu, vcpu->arch.mmio_vmx_offset); + + if (vmx_offset == -1) + return -1; + + reg.vval = VCPU_VSX_VR(vcpu, index); + *val = reg.vsx16val[vmx_offset]; + + return result; } -/* handle quadword store in two halves */ -int kvmppc_handle_store128_by2x64(struct kvm_run *run, struct kvm_vcpu *vcpu, - unsigned int rs, int is_default_endian) +int kvmppc_get_vmx_byte(struct kvm_vcpu *vcpu, int index, u64 *val) +{ + union kvmppc_one_reg reg; + int vmx_offset = 0; + int result = 0; + + vmx_offset = + kvmppc_get_vmx_byte_offset(vcpu, vcpu->arch.mmio_vmx_offset); + + if (vmx_offset == -1) + return -1; + + reg.vval = VCPU_VSX_VR(vcpu, index); + *val = reg.vsx8val[vmx_offset]; + + return result; +} + +int kvmppc_handle_vmx_store(struct kvm_run *run, struct kvm_vcpu *vcpu, + unsigned int rs, unsigned int bytes, int is_default_endian) { u64 val = 0; + unsigned int index = rs & KVM_MMIO_REG_MASK; enum emulation_result emulated = EMULATE_DONE; + if (vcpu->arch.mmio_vsx_copy_nums > 2) + return EMULATE_FAIL; + vcpu->arch.io_gpr = rs; while (vcpu->arch.mmio_vmx_copy_nums) { - if (kvmppc_get_vmx_data(vcpu, rs, &val) == -1) + switch (vcpu->arch.mmio_copy_type) { + case KVMPPC_VMX_COPY_DWORD: + if (kvmppc_get_vmx_dword(vcpu, index, &val) == -1) + return EMULATE_FAIL; + + break; + case KVMPPC_VMX_COPY_WORD: + if (kvmppc_get_vmx_word(vcpu, index, &val) == -1) + return EMULATE_FAIL; + break; + case KVMPPC_VMX_COPY_HWORD: + if (kvmppc_get_vmx_hword(vcpu, index, &val) == -1) + return EMULATE_FAIL; + break; + case KVMPPC_VMX_COPY_BYTE: + if (kvmppc_get_vmx_byte(vcpu, index, &val) == -1) + return EMULATE_FAIL; + break; + default: return EMULATE_FAIL; + } - emulated = kvmppc_handle_store(run, vcpu, val, 8, + emulated = kvmppc_handle_store(run, vcpu, val, bytes, is_default_endian); if (emulated != EMULATE_DONE) break; vcpu->arch.paddr_accessed += run->mmio.len; vcpu->arch.mmio_vmx_copy_nums--; + vcpu->arch.mmio_vmx_offset++; } return emulated; @@ -1422,11 +1616,11 @@ static int kvmppc_emulate_mmio_vmx_loadstore(struct kvm_vcpu *vcpu, vcpu->arch.paddr_accessed += run->mmio.len; if (!vcpu->mmio_is_write) { - emulated = kvmppc_handle_load128_by2x64(run, vcpu, - vcpu->arch.io_gpr, 1); + emulated = kvmppc_handle_vmx_load(run, vcpu, + vcpu->arch.io_gpr, run->mmio.len, 1); } else { - emulated = kvmppc_handle_store128_by2x64(run, vcpu, - vcpu->arch.io_gpr, 1); + emulated = kvmppc_handle_vmx_store(run, vcpu, + vcpu->arch.io_gpr, run->mmio.len, 1); } switch (emulated) { @@ -1570,8 +1764,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) } #endif #ifdef CONFIG_ALTIVEC - if (vcpu->arch.mmio_vmx_copy_nums > 0) + if (vcpu->arch.mmio_vmx_copy_nums > 0) { vcpu->arch.mmio_vmx_copy_nums--; + vcpu->arch.mmio_vmx_offset++; + } if (vcpu->arch.mmio_vmx_copy_nums > 0) { r = kvmppc_emulate_mmio_vmx_loadstore(vcpu, run); @@ -1784,16 +1980,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp, void __user *argp = (void __user *)arg; long r; - vcpu_load(vcpu); - switch (ioctl) { case KVM_ENABLE_CAP: { struct kvm_enable_cap cap; r = -EFAULT; + vcpu_load(vcpu); if (copy_from_user(&cap, argp, sizeof(cap))) goto out; r = kvm_vcpu_ioctl_enable_cap(vcpu, &cap); + vcpu_put(vcpu); break; } @@ -1815,9 +2011,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp, case KVM_DIRTY_TLB: { struct kvm_dirty_tlb dirty; r = -EFAULT; + vcpu_load(vcpu); if (copy_from_user(&dirty, argp, sizeof(dirty))) goto out; r = kvm_vcpu_ioctl_dirty_tlb(vcpu, &dirty); + vcpu_put(vcpu); break; } #endif @@ -1826,7 +2024,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp, } out: - vcpu_put(vcpu); return r; } diff --git a/arch/powerpc/kvm/tm.S b/arch/powerpc/kvm/tm.S new file mode 100644 index 000000000000..90e330f21356 --- /dev/null +++ b/arch/powerpc/kvm/tm.S @@ -0,0 +1,384 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Derived from book3s_hv_rmhandlers.S, which is: + * + * Copyright 2011 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com> + * + */ + +#include <asm/reg.h> +#include <asm/ppc_asm.h> +#include <asm/asm-offsets.h> +#include <asm/export.h> +#include <asm/tm.h> +#include <asm/cputable.h> + +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +#define VCPU_GPRS_TM(reg) (((reg) * ULONG_SIZE) + VCPU_GPR_TM) + +/* + * Save transactional state and TM-related registers. + * Called with: + * - r3 pointing to the vcpu struct + * - r4 points to the MSR with current TS bits: + * (For HV KVM, it is VCPU_MSR ; For PR KVM, it is host MSR). + * This can modify all checkpointed registers, but + * restores r1, r2 before exit. + */ +_GLOBAL(__kvmppc_save_tm) + mflr r0 + std r0, PPC_LR_STKOFF(r1) + + /* Turn on TM. */ + mfmsr r8 + li r0, 1 + rldimi r8, r0, MSR_TM_LG, 63-MSR_TM_LG + ori r8, r8, MSR_FP + oris r8, r8, (MSR_VEC | MSR_VSX)@h + mtmsrd r8 + + rldicl. r4, r4, 64 - MSR_TS_S_LG, 62 + beq 1f /* TM not active in guest. */ + + std r1, HSTATE_SCRATCH2(r13) + std r3, HSTATE_SCRATCH1(r13) + +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE +BEGIN_FTR_SECTION + /* Emulation of the treclaim instruction needs TEXASR before treclaim */ + mfspr r6, SPRN_TEXASR + std r6, VCPU_ORIG_TEXASR(r3) +END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_HV_ASSIST) +#endif + + /* Clear the MSR RI since r1, r13 are all going to be foobar. */ + li r5, 0 + mtmsrd r5, 1 + + li r3, TM_CAUSE_KVM_RESCHED + + /* All GPRs are volatile at this point. */ + TRECLAIM(R3) + + /* Temporarily store r13 and r9 so we have some regs to play with */ + SET_SCRATCH0(r13) + GET_PACA(r13) + std r9, PACATMSCRATCH(r13) + ld r9, HSTATE_SCRATCH1(r13) + + /* Get a few more GPRs free. */ + std r29, VCPU_GPRS_TM(29)(r9) + std r30, VCPU_GPRS_TM(30)(r9) + std r31, VCPU_GPRS_TM(31)(r9) + + /* Save away PPR and DSCR soon so don't run with user values. */ + mfspr r31, SPRN_PPR + HMT_MEDIUM + mfspr r30, SPRN_DSCR +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE + ld r29, HSTATE_DSCR(r13) + mtspr SPRN_DSCR, r29 +#endif + + /* Save all but r9, r13 & r29-r31 */ + reg = 0 + .rept 29 + .if (reg != 9) && (reg != 13) + std reg, VCPU_GPRS_TM(reg)(r9) + .endif + reg = reg + 1 + .endr + /* ... now save r13 */ + GET_SCRATCH0(r4) + std r4, VCPU_GPRS_TM(13)(r9) + /* ... and save r9 */ + ld r4, PACATMSCRATCH(r13) + std r4, VCPU_GPRS_TM(9)(r9) + + /* Reload stack pointer and TOC. */ + ld r1, HSTATE_SCRATCH2(r13) + ld r2, PACATOC(r13) + + /* Set MSR RI now we have r1 and r13 back. */ + li r5, MSR_RI + mtmsrd r5, 1 + + /* Save away checkpinted SPRs. */ + std r31, VCPU_PPR_TM(r9) + std r30, VCPU_DSCR_TM(r9) + mflr r5 + mfcr r6 + mfctr r7 + mfspr r8, SPRN_AMR + mfspr r10, SPRN_TAR + mfxer r11 + std r5, VCPU_LR_TM(r9) + stw r6, VCPU_CR_TM(r9) + std r7, VCPU_CTR_TM(r9) + std r8, VCPU_AMR_TM(r9) + std r10, VCPU_TAR_TM(r9) + std r11, VCPU_XER_TM(r9) + + /* Restore r12 as trap number. */ + lwz r12, VCPU_TRAP(r9) + + /* Save FP/VSX. */ + addi r3, r9, VCPU_FPRS_TM + bl store_fp_state + addi r3, r9, VCPU_VRS_TM + bl store_vr_state + mfspr r6, SPRN_VRSAVE + stw r6, VCPU_VRSAVE_TM(r9) +1: + /* + * We need to save these SPRs after the treclaim so that the software + * error code is recorded correctly in the TEXASR. Also the user may + * change these outside of a transaction, so they must always be + * context switched. + */ + mfspr r7, SPRN_TEXASR + std r7, VCPU_TEXASR(r9) +11: + mfspr r5, SPRN_TFHAR + mfspr r6, SPRN_TFIAR + std r5, VCPU_TFHAR(r9) + std r6, VCPU_TFIAR(r9) + + ld r0, PPC_LR_STKOFF(r1) + mtlr r0 + blr + +/* + * _kvmppc_save_tm_pr() is a wrapper around __kvmppc_save_tm(), so that it can + * be invoked from C function by PR KVM only. + */ +_GLOBAL(_kvmppc_save_tm_pr) + mflr r5 + std r5, PPC_LR_STKOFF(r1) + stdu r1, -SWITCH_FRAME_SIZE(r1) + SAVE_NVGPRS(r1) + + /* save MSR since TM/math bits might be impacted + * by __kvmppc_save_tm(). + */ + mfmsr r5 + SAVE_GPR(5, r1) + + /* also save DSCR/CR/TAR so that it can be recovered later */ + mfspr r6, SPRN_DSCR + SAVE_GPR(6, r1) + + mfcr r7 + stw r7, _CCR(r1) + + mfspr r8, SPRN_TAR + SAVE_GPR(8, r1) + + bl __kvmppc_save_tm + + REST_GPR(8, r1) + mtspr SPRN_TAR, r8 + + ld r7, _CCR(r1) + mtcr r7 + + REST_GPR(6, r1) + mtspr SPRN_DSCR, r6 + + /* need preserve current MSR's MSR_TS bits */ + REST_GPR(5, r1) + mfmsr r6 + rldicl r6, r6, 64 - MSR_TS_S_LG, 62 + rldimi r5, r6, MSR_TS_S_LG, 63 - MSR_TS_T_LG + mtmsrd r5 + + REST_NVGPRS(r1) + addi r1, r1, SWITCH_FRAME_SIZE + ld r5, PPC_LR_STKOFF(r1) + mtlr r5 + blr + +EXPORT_SYMBOL_GPL(_kvmppc_save_tm_pr); + +/* + * Restore transactional state and TM-related registers. + * Called with: + * - r3 pointing to the vcpu struct. + * - r4 is the guest MSR with desired TS bits: + * For HV KVM, it is VCPU_MSR + * For PR KVM, it is provided by caller + * This potentially modifies all checkpointed registers. + * It restores r1, r2 from the PACA. + */ +_GLOBAL(__kvmppc_restore_tm) + mflr r0 + std r0, PPC_LR_STKOFF(r1) + + /* Turn on TM/FP/VSX/VMX so we can restore them. */ + mfmsr r5 + li r6, MSR_TM >> 32 + sldi r6, r6, 32 + or r5, r5, r6 + ori r5, r5, MSR_FP + oris r5, r5, (MSR_VEC | MSR_VSX)@h + mtmsrd r5 + + /* + * The user may change these outside of a transaction, so they must + * always be context switched. + */ + ld r5, VCPU_TFHAR(r3) + ld r6, VCPU_TFIAR(r3) + ld r7, VCPU_TEXASR(r3) + mtspr SPRN_TFHAR, r5 + mtspr SPRN_TFIAR, r6 + mtspr SPRN_TEXASR, r7 + + mr r5, r4 + rldicl. r5, r5, 64 - MSR_TS_S_LG, 62 + beqlr /* TM not active in guest */ + std r1, HSTATE_SCRATCH2(r13) + + /* Make sure the failure summary is set, otherwise we'll program check + * when we trechkpt. It's possible that this might have been not set + * on a kvmppc_set_one_reg() call but we shouldn't let this crash the + * host. + */ + oris r7, r7, (TEXASR_FS)@h + mtspr SPRN_TEXASR, r7 + + /* + * We need to load up the checkpointed state for the guest. + * We need to do this early as it will blow away any GPRs, VSRs and + * some SPRs. + */ + + mr r31, r3 + addi r3, r31, VCPU_FPRS_TM + bl load_fp_state + addi r3, r31, VCPU_VRS_TM + bl load_vr_state + mr r3, r31 + lwz r7, VCPU_VRSAVE_TM(r3) + mtspr SPRN_VRSAVE, r7 + + ld r5, VCPU_LR_TM(r3) + lwz r6, VCPU_CR_TM(r3) + ld r7, VCPU_CTR_TM(r3) + ld r8, VCPU_AMR_TM(r3) + ld r9, VCPU_TAR_TM(r3) + ld r10, VCPU_XER_TM(r3) + mtlr r5 + mtcr r6 + mtctr r7 + mtspr SPRN_AMR, r8 + mtspr SPRN_TAR, r9 + mtxer r10 + + /* + * Load up PPR and DSCR values but don't put them in the actual SPRs + * till the last moment to avoid running with userspace PPR and DSCR for + * too long. + */ + ld r29, VCPU_DSCR_TM(r3) + ld r30, VCPU_PPR_TM(r3) + + std r2, PACATMSCRATCH(r13) /* Save TOC */ + + /* Clear the MSR RI since r1, r13 are all going to be foobar. */ + li r5, 0 + mtmsrd r5, 1 + + /* Load GPRs r0-r28 */ + reg = 0 + .rept 29 + ld reg, VCPU_GPRS_TM(reg)(r31) + reg = reg + 1 + .endr + + mtspr SPRN_DSCR, r29 + mtspr SPRN_PPR, r30 + + /* Load final GPRs */ + ld 29, VCPU_GPRS_TM(29)(r31) + ld 30, VCPU_GPRS_TM(30)(r31) + ld 31, VCPU_GPRS_TM(31)(r31) + + /* TM checkpointed state is now setup. All GPRs are now volatile. */ + TRECHKPT + + /* Now let's get back the state we need. */ + HMT_MEDIUM + GET_PACA(r13) +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE + ld r29, HSTATE_DSCR(r13) + mtspr SPRN_DSCR, r29 +#endif + ld r1, HSTATE_SCRATCH2(r13) + ld r2, PACATMSCRATCH(r13) + + /* Set the MSR RI since we have our registers back. */ + li r5, MSR_RI + mtmsrd r5, 1 + ld r0, PPC_LR_STKOFF(r1) + mtlr r0 + blr + +/* + * _kvmppc_restore_tm_pr() is a wrapper around __kvmppc_restore_tm(), so that it + * can be invoked from C function by PR KVM only. + */ +_GLOBAL(_kvmppc_restore_tm_pr) + mflr r5 + std r5, PPC_LR_STKOFF(r1) + stdu r1, -SWITCH_FRAME_SIZE(r1) + SAVE_NVGPRS(r1) + + /* save MSR to avoid TM/math bits change */ + mfmsr r5 + SAVE_GPR(5, r1) + + /* also save DSCR/CR/TAR so that it can be recovered later */ + mfspr r6, SPRN_DSCR + SAVE_GPR(6, r1) + + mfcr r7 + stw r7, _CCR(r1) + + mfspr r8, SPRN_TAR + SAVE_GPR(8, r1) + + bl __kvmppc_restore_tm + + REST_GPR(8, r1) + mtspr SPRN_TAR, r8 + + ld r7, _CCR(r1) + mtcr r7 + + REST_GPR(6, r1) + mtspr SPRN_DSCR, r6 + + /* need preserve current MSR's MSR_TS bits */ + REST_GPR(5, r1) + mfmsr r6 + rldicl r6, r6, 64 - MSR_TS_S_LG, 62 + rldimi r5, r6, MSR_TS_S_LG, 63 - MSR_TS_T_LG + mtmsrd r5 + + REST_NVGPRS(r1) + addi r1, r1, SWITCH_FRAME_SIZE + ld r5, PPC_LR_STKOFF(r1) + mtlr r5 + blr + +EXPORT_SYMBOL_GPL(_kvmppc_restore_tm_pr); +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 42e581a268e1..f12680c9b947 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -32,6 +32,7 @@ config RISCV select HAVE_MEMBLOCK_NODE_MAP select HAVE_DMA_CONTIGUOUS select HAVE_GENERIC_DMA_COHERENT + select HAVE_PERF_EVENTS select IRQ_DOMAIN select NO_BOOTMEM select RISCV_ISA_A if SMP @@ -193,6 +194,19 @@ config RISCV_ISA_C config RISCV_ISA_A def_bool y +menu "supported PMU type" + depends on PERF_EVENTS + +config RISCV_BASE_PMU + bool "Base Performance Monitoring Unit" + def_bool y + help + A base PMU that serves as a reference implementation and has limited + feature of perf. It can run on any RISC-V machines so serves as the + fallback, but this option can also be disable to reduce kernel size. + +endmenu + endmenu menu "Kernel type" diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index 76e958a5414a..6d4a5f6c3f4f 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -71,6 +71,9 @@ KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-relax) # architectures. It's faster to have GCC emit only aligned accesses. KBUILD_CFLAGS += $(call cc-option,-mstrict-align) +# arch specific predefines for sparse +CHECKFLAGS += -D__riscv -D__riscv_xlen=$(BITS) + head-y := arch/riscv/kernel/head.o core-y += arch/riscv/kernel/ arch/riscv/mm/ diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig index bca0eee733b0..07326466871b 100644 --- a/arch/riscv/configs/defconfig +++ b/arch/riscv/configs/defconfig @@ -44,6 +44,7 @@ CONFIG_INPUT_MOUSEDEV=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_OF_PLATFORM=y +CONFIG_HVC_RISCV_SBI=y # CONFIG_PTP_1588_CLOCK is not set CONFIG_DRM=y CONFIG_DRM_RADEON=y diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild index 4286a5f83876..576ffdca06ba 100644 --- a/arch/riscv/include/asm/Kbuild +++ b/arch/riscv/include/asm/Kbuild @@ -25,6 +25,7 @@ generic-y += kdebug.h generic-y += kmap_types.h generic-y += kvm_para.h generic-y += local.h +generic-y += local64.h generic-y += mm-arch-hooks.h generic-y += mman.h generic-y += module.h diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h index efd89a88d2d0..8f13074413a7 100644 --- a/arch/riscv/include/asm/cacheflush.h +++ b/arch/riscv/include/asm/cacheflush.h @@ -47,7 +47,7 @@ static inline void flush_dcache_page(struct page *page) #else /* CONFIG_SMP */ -#define flush_icache_all() sbi_remote_fence_i(0) +#define flush_icache_all() sbi_remote_fence_i(NULL) void flush_icache_mm(struct mm_struct *mm, bool local); #endif /* CONFIG_SMP */ diff --git a/arch/riscv/include/asm/perf_event.h b/arch/riscv/include/asm/perf_event.h new file mode 100644 index 000000000000..0e638a0c3feb --- /dev/null +++ b/arch/riscv/include/asm/perf_event.h @@ -0,0 +1,84 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2018 SiFive + * Copyright (C) 2018 Andes Technology Corporation + * + */ + +#ifndef _ASM_RISCV_PERF_EVENT_H +#define _ASM_RISCV_PERF_EVENT_H + +#include <linux/perf_event.h> +#include <linux/ptrace.h> + +#define RISCV_BASE_COUNTERS 2 + +/* + * The RISCV_MAX_COUNTERS parameter should be specified. + */ + +#ifdef CONFIG_RISCV_BASE_PMU +#define RISCV_MAX_COUNTERS 2 +#endif + +#ifndef RISCV_MAX_COUNTERS +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU." +#endif + +/* + * These are the indexes of bits in counteren register *minus* 1, + * except for cycle. It would be coherent if it can directly mapped + * to counteren bit definition, but there is a *time* register at + * counteren[1]. Per-cpu structure is scarce resource here. + * + * According to the spec, an implementation can support counter up to + * mhpmcounter31, but many high-end processors has at most 6 general + * PMCs, we give the definition to MHPMCOUNTER8 here. + */ +#define RISCV_PMU_CYCLE 0 +#define RISCV_PMU_INSTRET 1 +#define RISCV_PMU_MHPMCOUNTER3 2 +#define RISCV_PMU_MHPMCOUNTER4 3 +#define RISCV_PMU_MHPMCOUNTER5 4 +#define RISCV_PMU_MHPMCOUNTER6 5 +#define RISCV_PMU_MHPMCOUNTER7 6 +#define RISCV_PMU_MHPMCOUNTER8 7 + +#define RISCV_OP_UNSUPP (-EOPNOTSUPP) + +struct cpu_hw_events { + /* # currently enabled events*/ + int n_events; + /* currently enabled events */ + struct perf_event *events[RISCV_MAX_COUNTERS]; + /* vendor-defined PMU data */ + void *platform; +}; + +struct riscv_pmu { + struct pmu *pmu; + + /* generic hw/cache events table */ + const int *hw_events; + const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX]; + /* method used to map hw/cache events */ + int (*map_hw_event)(u64 config); + int (*map_cache_event)(u64 config); + + /* max generic hw events in map */ + int max_events; + /* number total counters, 2(base) + x(general) */ + int num_counters; + /* the width of the counter */ + int counter_width; + + /* vendor-defined PMU features */ + void *platform; + + irqreturn_t (*handle_irq)(int irq_num, void *dev); + int irq; +}; + +#endif /* _ASM_RISCV_PERF_EVENT_H */ diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 7b209aec355d..85c2d8bae957 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -49,7 +49,7 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, #include <asm/sbi.h> -#define flush_tlb_all() sbi_remote_sfence_vma(0, 0, -1) +#define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) #define flush_tlb_range(vma, start, end) \ sbi_remote_sfence_vma(mm_cpumask((vma)->vm_mm)->bits, \ diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h index 14b0b22fb578..473cfc84e412 100644 --- a/arch/riscv/include/asm/uaccess.h +++ b/arch/riscv/include/asm/uaccess.h @@ -392,19 +392,21 @@ do { \ }) -extern unsigned long __must_check __copy_user(void __user *to, +extern unsigned long __must_check __asm_copy_to_user(void __user *to, + const void *from, unsigned long n); +extern unsigned long __must_check __asm_copy_from_user(void *to, const void __user *from, unsigned long n); static inline unsigned long raw_copy_from_user(void *to, const void __user *from, unsigned long n) { - return __copy_user(to, from, n); + return __asm_copy_to_user(to, from, n); } static inline unsigned long raw_copy_to_user(void __user *to, const void *from, unsigned long n) { - return __copy_user(to, from, n); + return __asm_copy_from_user(to, from, n); } extern long strncpy_from_user(char *dest, const char __user *src, long count); diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 8586dd96c2f0..e1274fc03af4 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -39,4 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o +obj-$(CONFIG_PERF_EVENTS) += perf_event.o + clean: diff --git a/arch/riscv/kernel/mcount.S b/arch/riscv/kernel/mcount.S index ce9bdc57a2a1..5721624886a1 100644 --- a/arch/riscv/kernel/mcount.S +++ b/arch/riscv/kernel/mcount.S @@ -126,5 +126,5 @@ do_trace: RESTORE_ABI_STATE ret ENDPROC(_mcount) -EXPORT_SYMBOL(_mcount) #endif +EXPORT_SYMBOL(_mcount) diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c index 5dddba301d0a..1d5e9b934b8c 100644 --- a/arch/riscv/kernel/module.c +++ b/arch/riscv/kernel/module.c @@ -17,6 +17,17 @@ #include <linux/errno.h> #include <linux/moduleloader.h> +static int apply_r_riscv_32_rela(struct module *me, u32 *location, Elf_Addr v) +{ + if (v != (u32)v) { + pr_err("%s: value %016llx out of range for 32-bit field\n", + me->name, v); + return -EINVAL; + } + *location = v; + return 0; +} + static int apply_r_riscv_64_rela(struct module *me, u32 *location, Elf_Addr v) { *(u64 *)location = v; @@ -265,6 +276,7 @@ static int apply_r_riscv_sub32_rela(struct module *me, u32 *location, static int (*reloc_handlers_rela[]) (struct module *me, u32 *location, Elf_Addr v) = { + [R_RISCV_32] = apply_r_riscv_32_rela, [R_RISCV_64] = apply_r_riscv_64_rela, [R_RISCV_BRANCH] = apply_r_riscv_branch_rela, [R_RISCV_JAL] = apply_r_riscv_jal_rela, diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_event.c new file mode 100644 index 000000000000..b0e10c4e9f77 --- /dev/null +++ b/arch/riscv/kernel/perf_event.c @@ -0,0 +1,485 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de> + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar + * Copyright (C) 2009 Jaswinder Singh Rajput + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra + * Copyright (C) 2009 Intel Corporation, <markus.t.metzger@intel.com> + * Copyright (C) 2009 Google, Inc., Stephane Eranian + * Copyright 2014 Tilera Corporation. All Rights Reserved. + * Copyright (C) 2018 Andes Technology Corporation + * + * Perf_events support for RISC-V platforms. + * + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough + * functionality for perf event to fully work, this file provides + * the very basic framework only. + * + * For platform portings, please check Documentations/riscv/pmu.txt. + * + * The Copyright line includes x86 and tile ones. + */ + +#include <linux/kprobes.h> +#include <linux/kernel.h> +#include <linux/kdebug.h> +#include <linux/mutex.h> +#include <linux/bitmap.h> +#include <linux/irq.h> +#include <linux/interrupt.h> +#include <linux/perf_event.h> +#include <linux/atomic.h> +#include <linux/of.h> +#include <asm/perf_event.h> + +static const struct riscv_pmu *riscv_pmu __read_mostly; +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events); + +/* + * Hardware & cache maps and their methods + */ + +static const int riscv_hw_event_map[] = { + [PERF_COUNT_HW_CPU_CYCLES] = RISCV_PMU_CYCLE, + [PERF_COUNT_HW_INSTRUCTIONS] = RISCV_PMU_INSTRET, + [PERF_COUNT_HW_CACHE_REFERENCES] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_CACHE_MISSES] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_BRANCH_MISSES] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_BUS_CYCLES] = RISCV_OP_UNSUPP, +}; + +#define C(x) PERF_COUNT_HW_CACHE_##x +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX] +[PERF_COUNT_HW_CACHE_OP_MAX] +[PERF_COUNT_HW_CACHE_RESULT_MAX] = { + [C(L1D)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(L1I)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(LL)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(DTLB)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(ITLB)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(BPU)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, +}; + +static int riscv_map_hw_event(u64 config) +{ + if (config >= riscv_pmu->max_events) + return -EINVAL; + + return riscv_pmu->hw_events[config]; +} + +int riscv_map_cache_decode(u64 config, unsigned int *type, + unsigned int *op, unsigned int *result) +{ + return -ENOENT; +} + +static int riscv_map_cache_event(u64 config) +{ + unsigned int type, op, result; + int err = -ENOENT; + int code; + + err = riscv_map_cache_decode(config, &type, &op, &result); + if (!riscv_pmu->cache_events || err) + return err; + + if (type >= PERF_COUNT_HW_CACHE_MAX || + op >= PERF_COUNT_HW_CACHE_OP_MAX || + result >= PERF_COUNT_HW_CACHE_RESULT_MAX) + return -EINVAL; + + code = (*riscv_pmu->cache_events)[type][op][result]; + if (code == RISCV_OP_UNSUPP) + return -EINVAL; + + return code; +} + +/* + * Low-level functions: reading/writing counters + */ + +static inline u64 read_counter(int idx) +{ + u64 val = 0; + + switch (idx) { + case RISCV_PMU_CYCLE: + val = csr_read(cycle); + break; + case RISCV_PMU_INSTRET: + val = csr_read(instret); + break; + default: + WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS); + return -EINVAL; + } + + return val; +} + +static inline void write_counter(int idx, u64 value) +{ + /* currently not supported */ + WARN_ON_ONCE(1); +} + +/* + * pmu->read: read and update the counter + * + * Other architectures' implementation often have a xxx_perf_event_update + * routine, which can return counter values when called in the IRQ, but + * return void when being called by the pmu->read method. + */ +static void riscv_pmu_read(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + u64 prev_raw_count, new_raw_count; + u64 oldval; + int idx = hwc->idx; + u64 delta; + + do { + prev_raw_count = local64_read(&hwc->prev_count); + new_raw_count = read_counter(idx); + + oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count, + new_raw_count); + } while (oldval != prev_raw_count); + + /* + * delta is the value to update the counter we maintain in the kernel. + */ + delta = (new_raw_count - prev_raw_count) & + ((1ULL << riscv_pmu->counter_width) - 1); + local64_add(delta, &event->count); + /* + * Something like local64_sub(delta, &hwc->period_left) here is + * needed if there is an interrupt for perf. + */ +} + +/* + * State transition functions: + * + * stop()/start() & add()/del() + */ + +/* + * pmu->stop: stop the counter + */ +static void riscv_pmu_stop(struct perf_event *event, int flags) +{ + struct hw_perf_event *hwc = &event->hw; + + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); + hwc->state |= PERF_HES_STOPPED; + + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) { + riscv_pmu->pmu->read(event); + hwc->state |= PERF_HES_UPTODATE; + } +} + +/* + * pmu->start: start the event. + */ +static void riscv_pmu_start(struct perf_event *event, int flags) +{ + struct hw_perf_event *hwc = &event->hw; + + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) + return; + + if (flags & PERF_EF_RELOAD) { + WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE)); + + /* + * Set the counter to the period to the next interrupt here, + * if you have any. + */ + } + + hwc->state = 0; + perf_event_update_userpage(event); + + /* + * Since we cannot write to counters, this serves as an initialization + * to the delta-mechanism in pmu->read(); otherwise, the delta would be + * wrong when pmu->read is called for the first time. + */ + local64_set(&hwc->prev_count, read_counter(hwc->idx)); +} + +/* + * pmu->add: add the event to PMU. + */ +static int riscv_pmu_add(struct perf_event *event, int flags) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc = &event->hw; + + if (cpuc->n_events == riscv_pmu->num_counters) + return -ENOSPC; + + /* + * We don't have general conunters, so no binding-event-to-counter + * process here. + * + * Indexing using hwc->config generally not works, since config may + * contain extra information, but here the only info we have in + * hwc->config is the event index. + */ + hwc->idx = hwc->config; + cpuc->events[hwc->idx] = event; + cpuc->n_events++; + + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; + + if (flags & PERF_EF_START) + riscv_pmu->pmu->start(event, PERF_EF_RELOAD); + + return 0; +} + +/* + * pmu->del: delete the event from PMU. + */ +static void riscv_pmu_del(struct perf_event *event, int flags) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc = &event->hw; + + cpuc->events[hwc->idx] = NULL; + cpuc->n_events--; + riscv_pmu->pmu->stop(event, PERF_EF_UPDATE); + perf_event_update_userpage(event); +} + +/* + * Interrupt: a skeletion for reference. + */ + +static DEFINE_MUTEX(pmc_reserve_mutex); + +irqreturn_t riscv_base_pmu_handle_irq(int irq_num, void *dev) +{ + return IRQ_NONE; +} + +static int reserve_pmc_hardware(void) +{ + int err = 0; + + mutex_lock(&pmc_reserve_mutex); + if (riscv_pmu->irq >= 0 && riscv_pmu->handle_irq) { + err = request_irq(riscv_pmu->irq, riscv_pmu->handle_irq, + IRQF_PERCPU, "riscv-base-perf", NULL); + } + mutex_unlock(&pmc_reserve_mutex); + + return err; +} + +void release_pmc_hardware(void) +{ + mutex_lock(&pmc_reserve_mutex); + if (riscv_pmu->irq >= 0) + free_irq(riscv_pmu->irq, NULL); + mutex_unlock(&pmc_reserve_mutex); +} + +/* + * Event Initialization/Finalization + */ + +static atomic_t riscv_active_events = ATOMIC_INIT(0); + +static void riscv_event_destroy(struct perf_event *event) +{ + if (atomic_dec_return(&riscv_active_events) == 0) + release_pmc_hardware(); +} + +static int riscv_event_init(struct perf_event *event) +{ + struct perf_event_attr *attr = &event->attr; + struct hw_perf_event *hwc = &event->hw; + int err; + int code; + + if (atomic_inc_return(&riscv_active_events) == 1) { + err = reserve_pmc_hardware(); + + if (err) { + pr_warn("PMC hardware not available\n"); + atomic_dec(&riscv_active_events); + return -EBUSY; + } + } + + switch (event->attr.type) { + case PERF_TYPE_HARDWARE: + code = riscv_pmu->map_hw_event(attr->config); + break; + case PERF_TYPE_HW_CACHE: + code = riscv_pmu->map_cache_event(attr->config); + break; + case PERF_TYPE_RAW: + return -EOPNOTSUPP; + default: + return -ENOENT; + } + + event->destroy = riscv_event_destroy; + if (code < 0) { + event->destroy(event); + return code; + } + + /* + * idx is set to -1 because the index of a general event should not be + * decided until binding to some counter in pmu->add(). + * + * But since we don't have such support, later in pmu->add(), we just + * use hwc->config as the index instead. + */ + hwc->config = code; + hwc->idx = -1; + + return 0; +} + +/* + * Initialization + */ + +static struct pmu min_pmu = { + .name = "riscv-base", + .event_init = riscv_event_init, + .add = riscv_pmu_add, + .del = riscv_pmu_del, + .start = riscv_pmu_start, + .stop = riscv_pmu_stop, + .read = riscv_pmu_read, +}; + +static const struct riscv_pmu riscv_base_pmu = { + .pmu = &min_pmu, + .max_events = ARRAY_SIZE(riscv_hw_event_map), + .map_hw_event = riscv_map_hw_event, + .hw_events = riscv_hw_event_map, + .map_cache_event = riscv_map_cache_event, + .cache_events = &riscv_cache_event_map, + .counter_width = 63, + .num_counters = RISCV_BASE_COUNTERS + 0, + .handle_irq = &riscv_base_pmu_handle_irq, + + /* This means this PMU has no IRQ. */ + .irq = -1, +}; + +static const struct of_device_id riscv_pmu_of_ids[] = { + {.compatible = "riscv,base-pmu", .data = &riscv_base_pmu}, + { /* sentinel value */ } +}; + +int __init init_hw_perf_events(void) +{ + struct device_node *node = of_find_node_by_type(NULL, "pmu"); + const struct of_device_id *of_id; + + riscv_pmu = &riscv_base_pmu; + + if (node) { + of_id = of_match_node(riscv_pmu_of_ids, node); + + if (of_id) + riscv_pmu = of_id->data; + } + + perf_pmu_register(riscv_pmu->pmu, "cpu", PERF_TYPE_RAW); + return 0; +} +arch_initcall(init_hw_perf_events); diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c index 551734248748..f247d6d2137c 100644 --- a/arch/riscv/kernel/riscv_ksyms.c +++ b/arch/riscv/kernel/riscv_ksyms.c @@ -13,6 +13,7 @@ * Assembly functions that may be used (directly or indirectly) by modules */ EXPORT_SYMBOL(__clear_user); -EXPORT_SYMBOL(__copy_user); +EXPORT_SYMBOL(__asm_copy_to_user); +EXPORT_SYMBOL(__asm_copy_from_user); EXPORT_SYMBOL(memset); EXPORT_SYMBOL(memcpy); diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index b99d9dd21fd0..81a1952015a6 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -148,7 +148,7 @@ int is_valid_bugaddr(unsigned long pc) if (pc < PAGE_OFFSET) return 0; - if (probe_kernel_address((bug_insn_t __user *)pc, insn)) + if (probe_kernel_address((bug_insn_t *)pc, insn)) return 0; return (insn == __BUG_INSN); } diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S index 58fb2877c865..399e6f0c2d98 100644 --- a/arch/riscv/lib/uaccess.S +++ b/arch/riscv/lib/uaccess.S @@ -13,7 +13,8 @@ _epc: .previous .endm -ENTRY(__copy_user) +ENTRY(__asm_copy_to_user) +ENTRY(__asm_copy_from_user) /* Enable access to user memory */ li t6, SR_SUM @@ -63,7 +64,8 @@ ENTRY(__copy_user) addi a0, a0, 1 bltu a1, a3, 5b j 3b -ENDPROC(__copy_user) +ENDPROC(__asm_copy_to_user) +ENDPROC(__asm_copy_from_user) ENTRY(__clear_user) @@ -84,7 +86,7 @@ ENTRY(__clear_user) bgeu t0, t1, 2f bltu a0, t0, 4f 1: - fixup REG_S, zero, (a0), 10f + fixup REG_S, zero, (a0), 11f addi a0, a0, SZREG bltu a0, t1, 1b 2: @@ -96,12 +98,12 @@ ENTRY(__clear_user) li a0, 0 ret 4: /* Edge case: unalignment */ - fixup sb, zero, (a0), 10f + fixup sb, zero, (a0), 11f addi a0, a0, 1 bltu a0, t0, 4b j 1b 5: /* Edge case: remainder */ - fixup sb, zero, (a0), 10f + fixup sb, zero, (a0), 11f addi a0, a0, 1 bltu a0, a3, 5b j 3b @@ -109,9 +111,14 @@ ENDPROC(__clear_user) .section .fixup,"ax" .balign 4 + /* Fixup code for __copy_user(10) and __clear_user(11) */ 10: /* Disable access to user memory */ csrs sstatus, t6 - sub a0, a3, a0 + mv a0, a2 + ret +11: + csrs sstatus, t6 + mv a0, a1 ret .previous diff --git a/arch/um/drivers/vector_kern.c b/arch/um/drivers/vector_kern.c index 627075e6d875..50ee3bb5a63a 100644 --- a/arch/um/drivers/vector_kern.c +++ b/arch/um/drivers/vector_kern.c @@ -188,7 +188,7 @@ static int get_transport_options(struct arglist *def) if (strncmp(transport, TRANS_TAP, TRANS_TAP_LEN) == 0) return (vec_rx | VECTOR_BPF); if (strncmp(transport, TRANS_RAW, TRANS_RAW_LEN) == 0) - return (vec_rx | vec_tx); + return (vec_rx | vec_tx | VECTOR_QDISC_BYPASS); return (vec_rx | vec_tx); } @@ -504,15 +504,19 @@ static struct vector_queue *create_queue( result = kmalloc(sizeof(struct vector_queue), GFP_KERNEL); if (result == NULL) - goto out_fail; + return NULL; result->max_depth = max_size; result->dev = vp->dev; result->mmsg_vector = kmalloc( (sizeof(struct mmsghdr) * max_size), GFP_KERNEL); + if (result->mmsg_vector == NULL) + goto out_mmsg_fail; result->skbuff_vector = kmalloc( (sizeof(void *) * max_size), GFP_KERNEL); - if (result->mmsg_vector == NULL || result->skbuff_vector == NULL) - goto out_fail; + if (result->skbuff_vector == NULL) + goto out_skb_fail; + + /* further failures can be handled safely by destroy_queue*/ mmsg_vector = result->mmsg_vector; for (i = 0; i < max_size; i++) { @@ -563,6 +567,11 @@ static struct vector_queue *create_queue( result->head = 0; result->tail = 0; return result; +out_skb_fail: + kfree(result->mmsg_vector); +out_mmsg_fail: + kfree(result); + return NULL; out_fail: destroy_queue(result); return NULL; @@ -1232,9 +1241,8 @@ static int vector_net_open(struct net_device *dev) if ((vp->options & VECTOR_QDISC_BYPASS) != 0) { if (!uml_raw_enable_qdisc_bypass(vp->fds->rx_fd)) - vp->options = vp->options | VECTOR_BPF; + vp->options |= VECTOR_BPF; } - if ((vp->options & VECTOR_BPF) != 0) vp->bpf = uml_vector_default_bpf(vp->fds->rx_fd, dev->dev_addr); diff --git a/arch/um/include/asm/common.lds.S b/arch/um/include/asm/common.lds.S index b30d73ca29d0..7adb4e6b658a 100644 --- a/arch/um/include/asm/common.lds.S +++ b/arch/um/include/asm/common.lds.S @@ -53,12 +53,6 @@ CON_INITCALL } - .uml.initcall.init : { - __uml_initcall_start = .; - *(.uml.initcall.init) - __uml_initcall_end = .; - } - SECURITY_INIT .exitcall : { diff --git a/arch/um/include/shared/init.h b/arch/um/include/shared/init.h index b3f5865a92c9..c66de434a983 100644 --- a/arch/um/include/shared/init.h +++ b/arch/um/include/shared/init.h @@ -64,14 +64,10 @@ struct uml_param { int (*setup_func)(char *, int *); }; -extern initcall_t __uml_initcall_start, __uml_initcall_end; extern initcall_t __uml_postsetup_start, __uml_postsetup_end; extern const char *__uml_help_start, *__uml_help_end; #endif -#define __uml_initcall(fn) \ - static initcall_t __uml_initcall_##fn __uml_init_call = fn - #define __uml_exitcall(fn) \ static exitcall_t __uml_exitcall_##fn __uml_exit_call = fn @@ -108,7 +104,6 @@ extern struct uml_param __uml_setup_start, __uml_setup_end; */ #define __uml_init_setup __used __section(.uml.setup.init) #define __uml_setup_help __used __section(.uml.help.init) -#define __uml_init_call __used __section(.uml.initcall.init) #define __uml_postsetup_call __used __section(.uml.postsetup.init) #define __uml_exit_call __used __section(.uml.exitcall.exit) diff --git a/arch/um/os-Linux/main.c b/arch/um/os-Linux/main.c index 5f970ece5ac3..f1fee2b91239 100644 --- a/arch/um/os-Linux/main.c +++ b/arch/um/os-Linux/main.c @@ -40,17 +40,6 @@ static void set_stklim(void) } } -static __init void do_uml_initcalls(void) -{ - initcall_t *call; - - call = &__uml_initcall_start; - while (call < &__uml_initcall_end) { - (*call)(); - call++; - } -} - static void last_ditch_exit(int sig) { uml_cleanup(); @@ -151,7 +140,6 @@ int __init main(int argc, char **argv, char **envp) scan_elf_aux(envp); #endif - do_uml_initcalls(); change_sig(SIGPIPE, 0); ret = linux_main(argc, argv); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d0dd35d582da..559a12b6184d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4429,16 +4429,14 @@ static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) goto out_vmcs; memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); -#if IS_ENABLED(CONFIG_HYPERV) - if (static_branch_unlikely(&enable_evmcs) && + if (IS_ENABLED(CONFIG_HYPERV) && + static_branch_unlikely(&enable_evmcs) && (ms_hyperv.nested_features & HV_X64_NESTED_MSR_BITMAP)) { struct hv_enlightened_vmcs *evmcs = (struct hv_enlightened_vmcs *)loaded_vmcs->vmcs; evmcs->hv_enlightenments_control.msr_bitmap = 1; } -#endif - } return 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6bcecc325e7e..0046aa70205a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8567,7 +8567,7 @@ int kvm_arch_hardware_setup(void) /* * Make sure the user can only configure tsc_khz values that * fit into a signed integer. - * A min value is not calculated needed because it will always + * A min value is not calculated because it will always * be 1 on all machines. */ u64 max = min(0x7fffffffULL, diff --git a/drivers/gpu/drm/shmobile/Kconfig b/drivers/gpu/drm/shmobile/Kconfig index c987c826daa3..0426d66660d1 100644 --- a/drivers/gpu/drm/shmobile/Kconfig +++ b/drivers/gpu/drm/shmobile/Kconfig @@ -2,7 +2,6 @@ config DRM_SHMOBILE tristate "DRM Support for SH Mobile" depends on DRM && ARM depends on ARCH_SHMOBILE || COMPILE_TEST - depends on FB_SH_MOBILE_MERAM || !FB_SH_MOBILE_MERAM select BACKLIGHT_CLASS_DEVICE select BACKLIGHT_LCD_SUPPORT select DRM_KMS_HELPER diff --git a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c index e7738939a86d..40df8887fc17 100644 --- a/drivers/gpu/drm/shmobile/shmob_drm_crtc.c +++ b/drivers/gpu/drm/shmobile/shmob_drm_crtc.c @@ -21,8 +21,6 @@ #include <drm/drm_gem_cma_helper.h> #include <drm/drm_plane_helper.h> -#include <video/sh_mobile_meram.h> - #include "shmob_drm_backlight.h" #include "shmob_drm_crtc.h" #include "shmob_drm_drv.h" @@ -47,20 +45,12 @@ static int shmob_drm_clk_on(struct shmob_drm_device *sdev) if (ret < 0) return ret; } -#if 0 - if (sdev->meram_dev && sdev->meram_dev->pdev) - pm_runtime_get_sync(&sdev->meram_dev->pdev->dev); -#endif return 0; } static void shmob_drm_clk_off(struct shmob_drm_device *sdev) { -#if 0 - if (sdev->meram_dev && sdev->meram_dev->pdev) - pm_runtime_put_sync(&sdev->meram_dev->pdev->dev); -#endif if (sdev->clock) clk_disable_unprepare(sdev->clock); } @@ -269,12 +259,6 @@ static void shmob_drm_crtc_stop(struct shmob_drm_crtc *scrtc) if (!scrtc->started) return; - /* Disable the MERAM cache. */ - if (scrtc->cache) { - sh_mobile_meram_cache_free(sdev->meram, scrtc->cache); - scrtc->cache = NULL; - } - /* Stop the LCDC. */ shmob_drm_crtc_start_stop(scrtc, false); @@ -305,7 +289,6 @@ static void shmob_drm_crtc_compute_base(struct shmob_drm_crtc *scrtc, { struct drm_crtc *crtc = &scrtc->crtc; struct drm_framebuffer *fb = crtc->primary->fb; - struct shmob_drm_device *sdev = crtc->dev->dev_private; struct drm_gem_cma_object *gem; unsigned int bpp; @@ -321,11 +304,6 @@ static void shmob_drm_crtc_compute_base(struct shmob_drm_crtc *scrtc, + y / (bpp == 4 ? 2 : 1) * fb->pitches[1] + x * (bpp == 16 ? 2 : 1); } - - if (scrtc->cache) - sh_mobile_meram_cache_update(sdev->meram, scrtc->cache, - scrtc->dma[0], scrtc->dma[1], - &scrtc->dma[0], &scrtc->dma[1]); } static void shmob_drm_crtc_update_base(struct shmob_drm_crtc *scrtc) @@ -372,9 +350,7 @@ static int shmob_drm_crtc_mode_set(struct drm_crtc *crtc, { struct shmob_drm_crtc *scrtc = to_shmob_crtc(crtc); struct shmob_drm_device *sdev = crtc->dev->dev_private; - const struct sh_mobile_meram_cfg *mdata = sdev->pdata->meram; const struct shmob_drm_format_info *format; - void *cache; format = shmob_drm_format_info(crtc->primary->fb->format->format); if (format == NULL) { @@ -386,24 +362,6 @@ static int shmob_drm_crtc_mode_set(struct drm_crtc *crtc, scrtc->format = format; scrtc->line_size = crtc->primary->fb->pitches[0]; - if (sdev->meram) { - /* Enable MERAM cache if configured. We need to de-init - * configured ICBs before we can re-initialize them. - */ - if (scrtc->cache) { - sh_mobile_meram_cache_free(sdev->meram, scrtc->cache); - scrtc->cache = NULL; - } - - cache = sh_mobile_meram_cache_alloc(sdev->meram, mdata, - crtc->primary->fb->pitches[0], - adjusted_mode->vdisplay, - format->meram, - &scrtc->line_size); - if (!IS_ERR(cache)) - scrtc->cache = cache; - } - shmob_drm_crtc_compute_base(scrtc, x, y); return 0; diff --git a/drivers/gpu/drm/shmobile/shmob_drm_crtc.h b/drivers/gpu/drm/shmobile/shmob_drm_crtc.h index f152973df11c..c11f421737dc 100644 --- a/drivers/gpu/drm/shmobile/shmob_drm_crtc.h +++ b/drivers/gpu/drm/shmobile/shmob_drm_crtc.h @@ -28,7 +28,6 @@ struct shmob_drm_crtc { int dpms; const struct shmob_drm_format_info *format; - void *cache; unsigned long dma[2]; unsigned int line_size; bool started; diff --git a/drivers/gpu/drm/shmobile/shmob_drm_drv.h b/drivers/gpu/drm/shmobile/shmob_drm_drv.h index 02ea315ba69a..088a6e55fa29 100644 --- a/drivers/gpu/drm/shmobile/shmob_drm_drv.h +++ b/drivers/gpu/drm/shmobile/shmob_drm_drv.h @@ -23,7 +23,6 @@ struct clk; struct device; struct drm_device; -struct sh_mobile_meram_info; struct shmob_drm_device { struct device *dev; @@ -31,7 +30,6 @@ struct shmob_drm_device { void __iomem *mmio; struct clk *clock; - struct sh_mobile_meram_info *meram; u32 lddckr; u32 ldmt1r; diff --git a/drivers/gpu/drm/shmobile/shmob_drm_kms.c b/drivers/gpu/drm/shmobile/shmob_drm_kms.c index d36919b14da7..447638581c08 100644 --- a/drivers/gpu/drm/shmobile/shmob_drm_kms.c +++ b/drivers/gpu/drm/shmobile/shmob_drm_kms.c @@ -18,8 +18,6 @@ #include <drm/drm_gem_cma_helper.h> #include <drm/drm_gem_framebuffer_helper.h> -#include <video/sh_mobile_meram.h> - #include "shmob_drm_crtc.h" #include "shmob_drm_drv.h" #include "shmob_drm_kms.h" @@ -35,55 +33,46 @@ static const struct shmob_drm_format_info shmob_drm_format_infos[] = { .bpp = 16, .yuv = false, .lddfr = LDDFR_PKF_RGB16, - .meram = SH_MOBILE_MERAM_PF_RGB, }, { .fourcc = DRM_FORMAT_RGB888, .bpp = 24, .yuv = false, .lddfr = LDDFR_PKF_RGB24, - .meram = SH_MOBILE_MERAM_PF_RGB, }, { .fourcc = DRM_FORMAT_ARGB8888, .bpp = 32, .yuv = false, .lddfr = LDDFR_PKF_ARGB32, - .meram = SH_MOBILE_MERAM_PF_RGB, }, { .fourcc = DRM_FORMAT_NV12, .bpp = 12, .yuv = true, .lddfr = LDDFR_CC | LDDFR_YF_420, - .meram = SH_MOBILE_MERAM_PF_NV, }, { .fourcc = DRM_FORMAT_NV21, .bpp = 12, .yuv = true, .lddfr = LDDFR_CC | LDDFR_YF_420, - .meram = SH_MOBILE_MERAM_PF_NV, }, { .fourcc = DRM_FORMAT_NV16, .bpp = 16, .yuv = true, .lddfr = LDDFR_CC | LDDFR_YF_422, - .meram = SH_MOBILE_MERAM_PF_NV, }, { .fourcc = DRM_FORMAT_NV61, .bpp = 16, .yuv = true, .lddfr = LDDFR_CC | LDDFR_YF_422, - .meram = SH_MOBILE_MERAM_PF_NV, }, { .fourcc = DRM_FORMAT_NV24, .bpp = 24, .yuv = true, .lddfr = LDDFR_CC | LDDFR_YF_444, - .meram = SH_MOBILE_MERAM_PF_NV24, }, { .fourcc = DRM_FORMAT_NV42, .bpp = 24, .yuv = true, .lddfr = LDDFR_CC | LDDFR_YF_444, - .meram = SH_MOBILE_MERAM_PF_NV24, }, }; diff --git a/drivers/gpu/drm/shmobile/shmob_drm_kms.h b/drivers/gpu/drm/shmobile/shmob_drm_kms.h index 06d5b7caa026..753e2817dc2c 100644 --- a/drivers/gpu/drm/shmobile/shmob_drm_kms.h +++ b/drivers/gpu/drm/shmobile/shmob_drm_kms.h @@ -24,7 +24,6 @@ struct shmob_drm_format_info { unsigned int bpp; bool yuv; u32 lddfr; - unsigned int meram; }; const struct shmob_drm_format_info *shmob_drm_format_info(u32 fourcc); diff --git a/drivers/gpu/drm/shmobile/shmob_drm_plane.c b/drivers/gpu/drm/shmobile/shmob_drm_plane.c index 97f6e4a3eb0d..1d0359f713ca 100644 --- a/drivers/gpu/drm/shmobile/shmob_drm_plane.c +++ b/drivers/gpu/drm/shmobile/shmob_drm_plane.c @@ -17,8 +17,6 @@ #include <drm/drm_fb_cma_helper.h> #include <drm/drm_gem_cma_helper.h> -#include <video/sh_mobile_meram.h> - #include "shmob_drm_drv.h" #include "shmob_drm_kms.h" #include "shmob_drm_plane.h" diff --git a/drivers/media/platform/via-camera.c b/drivers/media/platform/via-camera.c index f01c3e813247..c8bb82fe0b9d 100644 --- a/drivers/media/platform/via-camera.c +++ b/drivers/media/platform/via-camera.c @@ -27,7 +27,12 @@ #include <linux/via-core.h> #include <linux/via-gpio.h> #include <linux/via_i2c.h> + +#ifdef CONFIG_X86 #include <asm/olpc.h> +#else +#define machine_is_olpc(x) 0 +#endif #include "via-camera.h" diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h index 448d1fafc827..f4d81765221e 100644 --- a/drivers/net/ethernet/cavium/thunder/nic.h +++ b/drivers/net/ethernet/cavium/thunder/nic.h @@ -325,6 +325,8 @@ struct nicvf { struct tasklet_struct qs_err_task; struct work_struct reset_task; struct nicvf_work rx_mode_work; + /* spinlock to protect workqueue arguments from concurrent access */ + spinlock_t rx_mode_wq_lock; /* PTP timestamp */ struct cavium_ptp *ptp_clock; diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index 7135db45927e..135766c4296b 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -1923,17 +1923,12 @@ static int nicvf_ioctl(struct net_device *netdev, struct ifreq *req, int cmd) } } -static void nicvf_set_rx_mode_task(struct work_struct *work_arg) +static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs, + struct nicvf *nic) { - struct nicvf_work *vf_work = container_of(work_arg, struct nicvf_work, - work.work); - struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work); union nic_mbx mbx = {}; int idx; - if (!vf_work) - return; - /* From the inside of VM code flow we have only 128 bits memory * available to send message to host's PF, so send all mc addrs * one by one, starting from flush command in case if kernel @@ -1944,7 +1939,7 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg) mbx.xcast.msg = NIC_MBOX_MSG_RESET_XCAST; nicvf_send_msg_to_pf(nic, &mbx); - if (vf_work->mode & BGX_XCAST_MCAST_FILTER) { + if (mode & BGX_XCAST_MCAST_FILTER) { /* once enabling filtering, we need to signal to PF to add * its' own LMAC to the filter to accept packets for it. */ @@ -1954,23 +1949,46 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg) } /* check if we have any specific MACs to be added to PF DMAC filter */ - if (vf_work->mc) { + if (mc_addrs) { /* now go through kernel list of MACs and add them one by one */ - for (idx = 0; idx < vf_work->mc->count; idx++) { + for (idx = 0; idx < mc_addrs->count; idx++) { mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST; - mbx.xcast.data.mac = vf_work->mc->mc[idx]; + mbx.xcast.data.mac = mc_addrs->mc[idx]; nicvf_send_msg_to_pf(nic, &mbx); } - kfree(vf_work->mc); + kfree(mc_addrs); } /* and finally set rx mode for PF accordingly */ mbx.xcast.msg = NIC_MBOX_MSG_SET_XCAST; - mbx.xcast.data.mode = vf_work->mode; + mbx.xcast.data.mode = mode; nicvf_send_msg_to_pf(nic, &mbx); } +static void nicvf_set_rx_mode_task(struct work_struct *work_arg) +{ + struct nicvf_work *vf_work = container_of(work_arg, struct nicvf_work, + work.work); + struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work); + u8 mode; + struct xcast_addr_list *mc; + + if (!vf_work) + return; + + /* Save message data locally to prevent them from + * being overwritten by next ndo_set_rx_mode call(). + */ + spin_lock(&nic->rx_mode_wq_lock); + mode = vf_work->mode; + mc = vf_work->mc; + vf_work->mc = NULL; + spin_unlock(&nic->rx_mode_wq_lock); + + __nicvf_set_rx_mode_task(mode, mc, nic); +} + static void nicvf_set_rx_mode(struct net_device *netdev) { struct nicvf *nic = netdev_priv(netdev); @@ -2004,9 +2022,12 @@ static void nicvf_set_rx_mode(struct net_device *netdev) } } } + spin_lock(&nic->rx_mode_wq_lock); + kfree(nic->rx_mode_work.mc); nic->rx_mode_work.mc = mc_list; nic->rx_mode_work.mode = mode; - queue_delayed_work(nicvf_rx_mode_wq, &nic->rx_mode_work.work, 2 * HZ); + queue_delayed_work(nicvf_rx_mode_wq, &nic->rx_mode_work.work, 0); + spin_unlock(&nic->rx_mode_wq_lock); } static const struct net_device_ops nicvf_netdev_ops = { @@ -2163,6 +2184,7 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) INIT_WORK(&nic->reset_task, nicvf_reset_task); INIT_DELAYED_WORK(&nic->rx_mode_work.work, nicvf_set_rx_mode_task); + spin_lock_init(&nic->rx_mode_wq_lock); err = register_netdev(netdev); if (err) { diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c index 2edfdbdaae48..7b795edd9d3a 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c @@ -3362,10 +3362,17 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) err = sysfs_create_group(&adapter->port[0]->dev.kobj, &cxgb3_attr_group); + if (err) { + dev_err(&pdev->dev, "cannot create sysfs group\n"); + goto out_close_led; + } print_port_info(adapter, ai); return 0; +out_close_led: + t3_set_reg_field(adapter, A_T3DBG_GPIO_EN, F_GPIO0_OUT_VAL, 0); + out_free_dev: iounmap(adapter->regs); for (i = ai->nports0 + ai->nports1 - 1; i >= 0; --i) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index fc534e91c6b2..144d5fe6b944 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -760,9 +760,9 @@ struct ixgbe_adapter { #define IXGBE_RSS_KEY_SIZE 40 /* size of RSS Hash Key in bytes */ u32 *rss_key; -#ifdef CONFIG_XFRM +#ifdef CONFIG_XFRM_OFFLOAD struct ixgbe_ipsec *ipsec; -#endif /* CONFIG_XFRM */ +#endif /* CONFIG_XFRM_OFFLOAD */ }; static inline u8 ixgbe_max_rss_indices(struct ixgbe_adapter *adapter) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c index 344a1f213a5f..c116f459945d 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c @@ -158,7 +158,16 @@ static void ixgbe_ipsec_stop_data(struct ixgbe_adapter *adapter) reg |= IXGBE_SECRXCTRL_RX_DIS; IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, reg); - IXGBE_WRITE_FLUSH(hw); + /* If both Tx and Rx are ready there are no packets + * that we need to flush so the loopback configuration + * below is not necessary. + */ + t_rdy = IXGBE_READ_REG(hw, IXGBE_SECTXSTAT) & + IXGBE_SECTXSTAT_SECTX_RDY; + r_rdy = IXGBE_READ_REG(hw, IXGBE_SECRXSTAT) & + IXGBE_SECRXSTAT_SECRX_RDY; + if (t_rdy && r_rdy) + return; /* If the tx fifo doesn't have link, but still has data, * we can't clear the tx sec block. Set the MAC loopback @@ -185,7 +194,7 @@ static void ixgbe_ipsec_stop_data(struct ixgbe_adapter *adapter) IXGBE_SECTXSTAT_SECTX_RDY; r_rdy = IXGBE_READ_REG(hw, IXGBE_SECRXSTAT) & IXGBE_SECRXSTAT_SECRX_RDY; - } while (!t_rdy && !r_rdy && limit--); + } while (!(t_rdy && r_rdy) && limit--); /* undo loopback if we played with it earlier */ if (!link) { @@ -966,10 +975,22 @@ void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring, **/ void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter) { + struct ixgbe_hw *hw = &adapter->hw; struct ixgbe_ipsec *ipsec; + u32 t_dis, r_dis; size_t size; - if (adapter->hw.mac.type == ixgbe_mac_82598EB) + if (hw->mac.type == ixgbe_mac_82598EB) + return; + + /* If there is no support for either Tx or Rx offload + * we should not be advertising support for IPsec. + */ + t_dis = IXGBE_READ_REG(hw, IXGBE_SECTXSTAT) & + IXGBE_SECTXSTAT_SECTX_OFF_DIS; + r_dis = IXGBE_READ_REG(hw, IXGBE_SECRXSTAT) & + IXGBE_SECRXSTAT_SECRX_OFF_DIS; + if (t_dis || r_dis) return; ipsec = kzalloc(sizeof(*ipsec), GFP_KERNEL); @@ -1001,13 +1022,6 @@ void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter) adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops; -#define IXGBE_ESP_FEATURES (NETIF_F_HW_ESP | \ - NETIF_F_HW_ESP_TX_CSUM | \ - NETIF_F_GSO_ESP) - - adapter->netdev->features |= IXGBE_ESP_FEATURES; - adapter->netdev->hw_enc_features |= IXGBE_ESP_FEATURES; - return; err2: diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c index 893a9206e718..d361f570ca37 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c @@ -593,6 +593,14 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter) } #endif + /* To support macvlan offload we have to use num_tc to + * restrict the queues that can be used by the device. + * By doing this we can avoid reporting a false number of + * queues. + */ + if (vmdq_i > 1) + netdev_set_num_tc(adapter->netdev, 1); + /* populate TC0 for use by pool 0 */ netdev_set_tc_queue(adapter->netdev, 0, adapter->num_rx_queues_per_pool, 0); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 0b1ba3ae159c..3e87dbbc9024 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -6117,6 +6117,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter, #ifdef CONFIG_IXGBE_DCB ixgbe_init_dcb(adapter); #endif + ixgbe_init_ipsec_offload(adapter); /* default flow control settings */ hw->fc.requested_mode = ixgbe_fc_full; @@ -8822,14 +8823,6 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc) } else { netdev_reset_tc(dev); - /* To support macvlan offload we have to use num_tc to - * restrict the queues that can be used by the device. - * By doing this we can avoid reporting a false number of - * queues. - */ - if (!tc && adapter->num_rx_pools > 1) - netdev_set_num_tc(dev, 1); - if (adapter->hw.mac.type == ixgbe_mac_82598EB) adapter->hw.fc.requested_mode = adapter->last_lfc_mode; @@ -9904,7 +9897,7 @@ ixgbe_features_check(struct sk_buff *skb, struct net_device *dev, * the TSO, so it's the exception. */ if (skb->encapsulation && !(features & NETIF_F_TSO_MANGLEID)) { -#ifdef CONFIG_XFRM +#ifdef CONFIG_XFRM_OFFLOAD if (!skb->sp) #endif features &= ~NETIF_F_TSO; @@ -10437,6 +10430,14 @@ skip_sriov: if (hw->mac.type >= ixgbe_mac_82599EB) netdev->features |= NETIF_F_SCTP_CRC; +#ifdef CONFIG_XFRM_OFFLOAD +#define IXGBE_ESP_FEATURES (NETIF_F_HW_ESP | \ + NETIF_F_HW_ESP_TX_CSUM | \ + NETIF_F_GSO_ESP) + + if (adapter->ipsec) + netdev->features |= IXGBE_ESP_FEATURES; +#endif /* copy netdev features into list of user selectable features */ netdev->hw_features |= netdev->features | NETIF_F_HW_VLAN_CTAG_FILTER | @@ -10499,8 +10500,6 @@ skip_sriov: NETIF_F_FCOE_MTU; } #endif /* IXGBE_FCOE */ - ixgbe_init_ipsec_offload(adapter); - if (adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE) netdev->hw_features |= NETIF_F_LRO; if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h index e8ed37749ab1..44cfb2021145 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h @@ -599,13 +599,15 @@ struct ixgbe_nvm_version { #define IXGBE_SECTXCTRL_STORE_FORWARD 0x00000004 #define IXGBE_SECTXSTAT_SECTX_RDY 0x00000001 -#define IXGBE_SECTXSTAT_ECC_TXERR 0x00000002 +#define IXGBE_SECTXSTAT_SECTX_OFF_DIS 0x00000002 +#define IXGBE_SECTXSTAT_ECC_TXERR 0x00000004 #define IXGBE_SECRXCTRL_SECRX_DIS 0x00000001 #define IXGBE_SECRXCTRL_RX_DIS 0x00000002 #define IXGBE_SECRXSTAT_SECRX_RDY 0x00000001 -#define IXGBE_SECRXSTAT_ECC_RXERR 0x00000002 +#define IXGBE_SECRXSTAT_SECRX_OFF_DIS 0x00000002 +#define IXGBE_SECRXSTAT_ECC_RXERR 0x00000004 /* LinkSec (MacSec) Registers */ #define IXGBE_LSECTXCAP 0x08A00 diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 77b2adb29341..6aaaf3d9ba31 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -4756,12 +4756,6 @@ static void mlxsw_sp_rt6_destroy(struct mlxsw_sp_rt6 *mlxsw_sp_rt6) kfree(mlxsw_sp_rt6); } -static bool mlxsw_sp_fib6_rt_can_mp(const struct fib6_info *rt) -{ - /* RTF_CACHE routes are ignored */ - return (rt->fib6_flags & (RTF_GATEWAY | RTF_ADDRCONF)) == RTF_GATEWAY; -} - static struct fib6_info * mlxsw_sp_fib6_entry_rt(const struct mlxsw_sp_fib6_entry *fib6_entry) { @@ -4771,11 +4765,11 @@ mlxsw_sp_fib6_entry_rt(const struct mlxsw_sp_fib6_entry *fib6_entry) static struct mlxsw_sp_fib6_entry * mlxsw_sp_fib6_node_mp_entry_find(const struct mlxsw_sp_fib_node *fib_node, - const struct fib6_info *nrt, bool replace) + const struct fib6_info *nrt, bool append) { struct mlxsw_sp_fib6_entry *fib6_entry; - if (!mlxsw_sp_fib6_rt_can_mp(nrt) || replace) + if (!append) return NULL; list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) { @@ -4790,8 +4784,7 @@ mlxsw_sp_fib6_node_mp_entry_find(const struct mlxsw_sp_fib_node *fib_node, break; if (rt->fib6_metric < nrt->fib6_metric) continue; - if (rt->fib6_metric == nrt->fib6_metric && - mlxsw_sp_fib6_rt_can_mp(rt)) + if (rt->fib6_metric == nrt->fib6_metric) return fib6_entry; if (rt->fib6_metric > nrt->fib6_metric) break; @@ -5170,7 +5163,7 @@ static struct mlxsw_sp_fib6_entry * mlxsw_sp_fib6_node_entry_find(const struct mlxsw_sp_fib_node *fib_node, const struct fib6_info *nrt, bool replace) { - struct mlxsw_sp_fib6_entry *fib6_entry, *fallback = NULL; + struct mlxsw_sp_fib6_entry *fib6_entry; list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) { struct fib6_info *rt = mlxsw_sp_fib6_entry_rt(fib6_entry); @@ -5179,18 +5172,13 @@ mlxsw_sp_fib6_node_entry_find(const struct mlxsw_sp_fib_node *fib_node, continue; if (rt->fib6_table->tb6_id != nrt->fib6_table->tb6_id) break; - if (replace && rt->fib6_metric == nrt->fib6_metric) { - if (mlxsw_sp_fib6_rt_can_mp(rt) == - mlxsw_sp_fib6_rt_can_mp(nrt)) - return fib6_entry; - if (mlxsw_sp_fib6_rt_can_mp(nrt)) - fallback = fallback ?: fib6_entry; - } + if (replace && rt->fib6_metric == nrt->fib6_metric) + return fib6_entry; if (rt->fib6_metric > nrt->fib6_metric) - return fallback ?: fib6_entry; + return fib6_entry; } - return fallback; + return NULL; } static int @@ -5316,7 +5304,8 @@ static void mlxsw_sp_fib6_entry_replace(struct mlxsw_sp *mlxsw_sp, } static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp, - struct fib6_info *rt, bool replace) + struct fib6_info *rt, bool replace, + bool append) { struct mlxsw_sp_fib6_entry *fib6_entry; struct mlxsw_sp_fib_node *fib_node; @@ -5342,7 +5331,7 @@ static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp, /* Before creating a new entry, try to append route to an existing * multipath entry. */ - fib6_entry = mlxsw_sp_fib6_node_mp_entry_find(fib_node, rt, replace); + fib6_entry = mlxsw_sp_fib6_node_mp_entry_find(fib_node, rt, append); if (fib6_entry) { err = mlxsw_sp_fib6_entry_nexthop_add(mlxsw_sp, fib6_entry, rt); if (err) @@ -5350,6 +5339,14 @@ static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp, return 0; } + /* We received an append event, yet did not find any route to + * append to. + */ + if (WARN_ON(append)) { + err = -EINVAL; + goto err_fib6_entry_append; + } + fib6_entry = mlxsw_sp_fib6_entry_create(mlxsw_sp, fib_node, rt); if (IS_ERR(fib6_entry)) { err = PTR_ERR(fib6_entry); @@ -5367,6 +5364,7 @@ static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp, err_fib6_node_entry_link: mlxsw_sp_fib6_entry_destroy(mlxsw_sp, fib6_entry); err_fib6_entry_create: +err_fib6_entry_append: err_fib6_entry_nexthop_add: mlxsw_sp_fib_node_put(mlxsw_sp, fib_node); return err; @@ -5717,7 +5715,7 @@ static void mlxsw_sp_router_fib6_event_work(struct work_struct *work) struct mlxsw_sp_fib_event_work *fib_work = container_of(work, struct mlxsw_sp_fib_event_work, work); struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp; - bool replace; + bool replace, append; int err; rtnl_lock(); @@ -5728,8 +5726,10 @@ static void mlxsw_sp_router_fib6_event_work(struct work_struct *work) case FIB_EVENT_ENTRY_APPEND: /* fall through */ case FIB_EVENT_ENTRY_ADD: replace = fib_work->event == FIB_EVENT_ENTRY_REPLACE; + append = fib_work->event == FIB_EVENT_ENTRY_APPEND; err = mlxsw_sp_router_fib6_add(mlxsw_sp, - fib_work->fen6_info.rt, replace); + fib_work->fen6_info.rt, replace, + append); if (err) mlxsw_sp_router_fib_abort(mlxsw_sp); mlxsw_sp_rt6_release(fib_work->fen6_info.rt); diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index e97652c40d13..eea5666a86b2 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -1018,8 +1018,10 @@ mlxsw_sp_port_vlan_bridge_join(struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan, int err; /* No need to continue if only VLAN flags were changed */ - if (mlxsw_sp_port_vlan->bridge_port) + if (mlxsw_sp_port_vlan->bridge_port) { + mlxsw_sp_port_vlan_put(mlxsw_sp_port_vlan); return 0; + } err = mlxsw_sp_port_vlan_fid_join(mlxsw_sp_port_vlan, bridge_port); if (err) diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c index 19cfa162ac65..1decf3a1cad3 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/main.c +++ b/drivers/net/ethernet/netronome/nfp/flower/main.c @@ -455,6 +455,7 @@ static int nfp_flower_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, eth_hw_addr_random(nn->dp.netdev); netif_keep_dst(nn->dp.netdev); + nn->vnic_no_name = true; return 0; diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c index ec524d97869d..78afe75129ab 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c +++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c @@ -381,6 +381,8 @@ nfp_tun_neigh_event_handler(struct notifier_block *nb, unsigned long event, err = PTR_ERR_OR_ZERO(rt); if (err) return NOTIFY_DONE; + + ip_rt_put(rt); #else return NOTIFY_DONE; #endif diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h index 57cb035dcc6d..2a71a9ffd095 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h @@ -590,6 +590,8 @@ struct nfp_net_dp { * @vnic_list: Entry on device vNIC list * @pdev: Backpointer to PCI device * @app: APP handle if available + * @vnic_no_name: For non-port PF vNIC make ndo_get_phys_port_name return + * -EOPNOTSUPP to keep backwards compatibility (set by app) * @port: Pointer to nfp_port structure if vNIC is a port * @app_priv: APP private data for this vNIC */ @@ -663,6 +665,8 @@ struct nfp_net { struct pci_dev *pdev; struct nfp_app *app; + bool vnic_no_name; + struct nfp_port *port; void *app_priv; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 75110c8d6a90..d4c27f849f9b 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -3121,7 +3121,7 @@ static void nfp_net_stat64(struct net_device *netdev, struct nfp_net *nn = netdev_priv(netdev); int r; - for (r = 0; r < nn->dp.num_r_vecs; r++) { + for (r = 0; r < nn->max_r_vecs; r++) { struct nfp_net_r_vector *r_vec = &nn->r_vecs[r]; u64 data[3]; unsigned int start; @@ -3286,7 +3286,7 @@ nfp_net_get_phys_port_name(struct net_device *netdev, char *name, size_t len) if (nn->port) return nfp_port_get_phys_port_name(netdev, name, len); - if (nn->dp.is_vf) + if (nn->dp.is_vf || nn->vnic_no_name) return -EOPNOTSUPP; n = snprintf(name, len, "n%d", nn->id); diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c index 2dd89dba9311..d32af598da90 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c @@ -98,21 +98,18 @@ struct nfp_resource { static int nfp_cpp_resource_find(struct nfp_cpp *cpp, struct nfp_resource *res) { - char name_pad[NFP_RESOURCE_ENTRY_NAME_SZ] = {}; struct nfp_resource_entry entry; u32 cpp_id, key; int ret, i; cpp_id = NFP_CPP_ID(NFP_RESOURCE_TBL_TARGET, 3, 0); /* Atomic read */ - strncpy(name_pad, res->name, sizeof(name_pad)); - /* Search for a matching entry */ - if (!memcmp(name_pad, NFP_RESOURCE_TBL_NAME "\0\0\0\0\0\0\0\0", 8)) { + if (!strcmp(res->name, NFP_RESOURCE_TBL_NAME)) { nfp_err(cpp, "Grabbing device lock not supported\n"); return -EOPNOTSUPP; } - key = crc32_posix(name_pad, sizeof(name_pad)); + key = crc32_posix(res->name, NFP_RESOURCE_ENTRY_NAME_SZ); for (i = 0; i < NFP_RESOURCE_TBL_ENTRIES; i++) { u64 addr = NFP_RESOURCE_TBL_BASE + diff --git a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c index e78e5db39458..c694e3428dfc 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c @@ -384,6 +384,7 @@ int emac_sgmii_config(struct platform_device *pdev, struct emac_adapter *adpt) } sgmii_pdev = of_find_device_by_node(np); + of_node_put(np); if (!sgmii_pdev) { dev_err(&pdev->dev, "invalid internal-phy property\n"); return -ENODEV; diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c index 4ff231df7322..c5979569fd60 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c @@ -334,9 +334,10 @@ static int meson8b_dwmac_probe(struct platform_device *pdev) dwmac->data = (const struct meson8b_dwmac_data *) of_device_get_match_data(&pdev->dev); - if (!dwmac->data) - return -EINVAL; - + if (!dwmac->data) { + ret = -EINVAL; + goto err_remove_config_dt; + } res = platform_get_resource(pdev, IORESOURCE_MEM, 1); dwmac->regs = devm_ioremap_resource(&pdev->dev, res); if (IS_ERR(dwmac->regs)) { diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c index 14770fc8865e..1f50e83cafb2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/hwif.c +++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c @@ -252,13 +252,8 @@ int stmmac_hwif_init(struct stmmac_priv *priv) return ret; } - /* Run quirks, if needed */ - if (entry->quirks) { - ret = entry->quirks(priv); - if (ret) - return ret; - } - + /* Save quirks, if needed for posterior use */ + priv->hwif_quirks = entry->quirks; return 0; } diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index 025efbf6145c..76649adf8fb0 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -129,6 +129,7 @@ struct stmmac_priv { struct net_device *dev; struct device *device; struct mac_device_info *hw; + int (*hwif_quirks)(struct stmmac_priv *priv); struct mutex lock; /* RX Queue */ diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 11fb7c777d89..e79b0d7b388a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3182,17 +3182,22 @@ dma_map_err: static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb) { - struct ethhdr *ehdr; + struct vlan_ethhdr *veth; + __be16 vlan_proto; u16 vlanid; - if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) == - NETIF_F_HW_VLAN_CTAG_RX && - !__vlan_get_tag(skb, &vlanid)) { + veth = (struct vlan_ethhdr *)skb->data; + vlan_proto = veth->h_vlan_proto; + + if ((vlan_proto == htons(ETH_P_8021Q) && + dev->features & NETIF_F_HW_VLAN_CTAG_RX) || + (vlan_proto == htons(ETH_P_8021AD) && + dev->features & NETIF_F_HW_VLAN_STAG_RX)) { /* pop the vlan tag */ - ehdr = (struct ethhdr *)skb->data; - memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2); + vlanid = ntohs(veth->h_vlan_TCI); + memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2); skb_pull(skb, VLAN_HLEN); - __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid); + __vlan_hwaccel_put_tag(skb, vlan_proto, vlanid); } } @@ -4130,6 +4135,13 @@ static int stmmac_hw_init(struct stmmac_priv *priv) if (priv->dma_cap.tsoen) dev_info(priv->device, "TSO supported\n"); + /* Run HW quirks, if any */ + if (priv->hwif_quirks) { + ret = priv->hwif_quirks(priv); + if (ret) + return ret; + } + return 0; } @@ -4235,7 +4247,7 @@ int stmmac_dvr_probe(struct device *device, ndev->watchdog_timeo = msecs_to_jiffies(watchdog); #ifdef STMMAC_VLAN_TAG_USED /* Both mac100 and gmac support receive VLAN tag detection */ - ndev->features |= NETIF_F_HW_VLAN_CTAG_RX; + ndev->features |= NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_STAG_RX; #endif priv->msg_enable = netif_msg_init(debug, default_msg_level); diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c index 69e31ceccfae..2a0c06e0f730 100644 --- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c +++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c @@ -123,7 +123,6 @@ * @phy_node: pointer to the PHY device node * @mii_bus: pointer to the MII bus * @last_link: last link status - * @has_mdio: indicates whether MDIO is included in the HW */ struct net_local { @@ -144,7 +143,6 @@ struct net_local { struct mii_bus *mii_bus; int last_link; - bool has_mdio; }; @@ -863,14 +861,14 @@ static int xemaclite_mdio_setup(struct net_local *lp, struct device *dev) bus->write = xemaclite_mdio_write; bus->parent = dev; - lp->mii_bus = bus; - rc = of_mdiobus_register(bus, np); if (rc) { dev_err(dev, "Failed to register mdio bus.\n"); goto err_register; } + lp->mii_bus = bus; + return 0; err_register: @@ -1145,9 +1143,7 @@ static int xemaclite_of_probe(struct platform_device *ofdev) xemaclite_update_address(lp, ndev->dev_addr); lp->phy_node = of_parse_phandle(ofdev->dev.of_node, "phy-handle", 0); - rc = xemaclite_mdio_setup(lp, &ofdev->dev); - if (rc) - dev_warn(&ofdev->dev, "error registering MDIO bus\n"); + xemaclite_mdio_setup(lp, &ofdev->dev); dev_info(dev, "MAC address is now %pM\n", ndev->dev_addr); @@ -1191,7 +1187,7 @@ static int xemaclite_of_remove(struct platform_device *of_dev) struct net_local *lp = netdev_priv(ndev); /* Un-register the mii_bus, if configured */ - if (lp->has_mdio) { + if (lp->mii_bus) { mdiobus_unregister(lp->mii_bus); mdiobus_free(lp->mii_bus); lp->mii_bus = NULL; diff --git a/drivers/net/hyperv/Kconfig b/drivers/net/hyperv/Kconfig index 23a2d145813a..0765d5f61714 100644 --- a/drivers/net/hyperv/Kconfig +++ b/drivers/net/hyperv/Kconfig @@ -2,6 +2,5 @@ config HYPERV_NET tristate "Microsoft Hyper-V virtual network driver" depends on HYPERV select UCS2_STRING - select FAILOVER help Select this option to enable the Hyper-V virtual network driver. diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 23304aca25f9..1a924b867b07 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -901,6 +901,8 @@ struct net_device_context { struct hv_device *device_ctx; /* netvsc_device */ struct netvsc_device __rcu *nvdev; + /* list of netvsc net_devices */ + struct list_head list; /* reconfigure work */ struct delayed_work dwork; /* last reconfig time */ @@ -931,8 +933,6 @@ struct net_device_context { u32 vf_alloc; /* Serial number of the VF to team with */ u32 vf_serial; - - struct failover *failover; }; /* Per channel data */ @@ -1277,17 +1277,17 @@ struct ndis_lsov2_offload { struct ndis_ipsecv2_offload { u32 encap; - u16 ip6; - u16 ip4opt; - u16 ip6ext; - u16 ah; - u16 esp; - u16 ah_esp; - u16 xport; - u16 tun; - u16 xport_tun; - u16 lso; - u16 extseq; + u8 ip6; + u8 ip4opt; + u8 ip6ext; + u8 ah; + u8 esp; + u8 ah_esp; + u8 xport; + u8 tun; + u8 xport_tun; + u8 lso; + u8 extseq; u32 udp_esp; u32 auth; u32 crypto; @@ -1295,8 +1295,8 @@ struct ndis_ipsecv2_offload { }; struct ndis_rsc_offload { - u16 ip4; - u16 ip6; + u8 ip4; + u8 ip6; }; struct ndis_encap_offload { diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 7b18a8c267c2..fe2256bf1d13 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -42,7 +42,6 @@ #include <net/pkt_sched.h> #include <net/checksum.h> #include <net/ip6_checksum.h> -#include <net/failover.h> #include "hyperv_net.h" @@ -68,6 +67,8 @@ static int debug = -1; module_param(debug, int, 0444); MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)"); +static LIST_HEAD(netvsc_dev_list); + static void netvsc_change_rx_flags(struct net_device *net, int change) { struct net_device_context *ndev_ctx = netdev_priv(net); @@ -1780,6 +1781,36 @@ out_unlock: rtnl_unlock(); } +static struct net_device *get_netvsc_bymac(const u8 *mac) +{ + struct net_device_context *ndev_ctx; + + list_for_each_entry(ndev_ctx, &netvsc_dev_list, list) { + struct net_device *dev = hv_get_drvdata(ndev_ctx->device_ctx); + + if (ether_addr_equal(mac, dev->perm_addr)) + return dev; + } + + return NULL; +} + +static struct net_device *get_netvsc_byref(struct net_device *vf_netdev) +{ + struct net_device_context *net_device_ctx; + struct net_device *dev; + + dev = netdev_master_upper_dev_get(vf_netdev); + if (!dev || dev->netdev_ops != &device_ops) + return NULL; /* not a netvsc device */ + + net_device_ctx = netdev_priv(dev); + if (!rtnl_dereference(net_device_ctx->nvdev)) + return NULL; /* device is removed */ + + return dev; +} + /* Called when VF is injecting data into network stack. * Change the associated network device from VF to netvsc. * note: already called with rcu_read_lock @@ -1802,6 +1833,46 @@ static rx_handler_result_t netvsc_vf_handle_frame(struct sk_buff **pskb) return RX_HANDLER_ANOTHER; } +static int netvsc_vf_join(struct net_device *vf_netdev, + struct net_device *ndev) +{ + struct net_device_context *ndev_ctx = netdev_priv(ndev); + int ret; + + ret = netdev_rx_handler_register(vf_netdev, + netvsc_vf_handle_frame, ndev); + if (ret != 0) { + netdev_err(vf_netdev, + "can not register netvsc VF receive handler (err = %d)\n", + ret); + goto rx_handler_failed; + } + + ret = netdev_master_upper_dev_link(vf_netdev, ndev, + NULL, NULL, NULL); + if (ret != 0) { + netdev_err(vf_netdev, + "can not set master device %s (err = %d)\n", + ndev->name, ret); + goto upper_link_failed; + } + + /* set slave flag before open to prevent IPv6 addrconf */ + vf_netdev->flags |= IFF_SLAVE; + + schedule_delayed_work(&ndev_ctx->vf_takeover, VF_TAKEOVER_INT); + + call_netdevice_notifiers(NETDEV_JOIN, vf_netdev); + + netdev_info(vf_netdev, "joined to %s\n", ndev->name); + return 0; + +upper_link_failed: + netdev_rx_handler_unregister(vf_netdev); +rx_handler_failed: + return ret; +} + static void __netvsc_vf_setup(struct net_device *ndev, struct net_device *vf_netdev) { @@ -1852,95 +1923,104 @@ static void netvsc_vf_setup(struct work_struct *w) rtnl_unlock(); } -static int netvsc_pre_register_vf(struct net_device *vf_netdev, - struct net_device *ndev) +static int netvsc_register_vf(struct net_device *vf_netdev) { + struct net_device *ndev; struct net_device_context *net_device_ctx; struct netvsc_device *netvsc_dev; + int ret; + + if (vf_netdev->addr_len != ETH_ALEN) + return NOTIFY_DONE; + + /* + * We will use the MAC address to locate the synthetic interface to + * associate with the VF interface. If we don't find a matching + * synthetic interface, move on. + */ + ndev = get_netvsc_bymac(vf_netdev->perm_addr); + if (!ndev) + return NOTIFY_DONE; net_device_ctx = netdev_priv(ndev); netvsc_dev = rtnl_dereference(net_device_ctx->nvdev); if (!netvsc_dev || rtnl_dereference(net_device_ctx->vf_netdev)) - return -ENODEV; - - return 0; -} + return NOTIFY_DONE; -static int netvsc_register_vf(struct net_device *vf_netdev, - struct net_device *ndev) -{ - struct net_device_context *ndev_ctx = netdev_priv(ndev); - - /* set slave flag before open to prevent IPv6 addrconf */ - vf_netdev->flags |= IFF_SLAVE; - - schedule_delayed_work(&ndev_ctx->vf_takeover, VF_TAKEOVER_INT); + /* if syntihetic interface is a different namespace, + * then move the VF to that namespace; join will be + * done again in that context. + */ + if (!net_eq(dev_net(ndev), dev_net(vf_netdev))) { + ret = dev_change_net_namespace(vf_netdev, + dev_net(ndev), "eth%d"); + if (ret) + netdev_err(vf_netdev, + "could not move to same namespace as %s: %d\n", + ndev->name, ret); + else + netdev_info(vf_netdev, + "VF moved to namespace with: %s\n", + ndev->name); + return NOTIFY_DONE; + } - call_netdevice_notifiers(NETDEV_JOIN, vf_netdev); + netdev_info(ndev, "VF registering: %s\n", vf_netdev->name); - netdev_info(vf_netdev, "joined to %s\n", ndev->name); + if (netvsc_vf_join(vf_netdev, ndev) != 0) + return NOTIFY_DONE; dev_hold(vf_netdev); - rcu_assign_pointer(ndev_ctx->vf_netdev, vf_netdev); - - return 0; + rcu_assign_pointer(net_device_ctx->vf_netdev, vf_netdev); + return NOTIFY_OK; } /* VF up/down change detected, schedule to change data path */ -static int netvsc_vf_changed(struct net_device *vf_netdev, - struct net_device *ndev) +static int netvsc_vf_changed(struct net_device *vf_netdev) { struct net_device_context *net_device_ctx; struct netvsc_device *netvsc_dev; + struct net_device *ndev; bool vf_is_up = netif_running(vf_netdev); + ndev = get_netvsc_byref(vf_netdev); + if (!ndev) + return NOTIFY_DONE; + net_device_ctx = netdev_priv(ndev); netvsc_dev = rtnl_dereference(net_device_ctx->nvdev); if (!netvsc_dev) - return -ENODEV; + return NOTIFY_DONE; netvsc_switch_datapath(ndev, vf_is_up); netdev_info(ndev, "Data path switched %s VF: %s\n", vf_is_up ? "to" : "from", vf_netdev->name); - return 0; + return NOTIFY_OK; } -static int netvsc_pre_unregister_vf(struct net_device *vf_netdev, - struct net_device *ndev) +static int netvsc_unregister_vf(struct net_device *vf_netdev) { + struct net_device *ndev; struct net_device_context *net_device_ctx; - net_device_ctx = netdev_priv(ndev); - cancel_delayed_work_sync(&net_device_ctx->vf_takeover); - - return 0; -} - -static int netvsc_unregister_vf(struct net_device *vf_netdev, - struct net_device *ndev) -{ - struct net_device_context *net_device_ctx; + ndev = get_netvsc_byref(vf_netdev); + if (!ndev) + return NOTIFY_DONE; net_device_ctx = netdev_priv(ndev); + cancel_delayed_work_sync(&net_device_ctx->vf_takeover); netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name); + netdev_rx_handler_unregister(vf_netdev); + netdev_upper_dev_unlink(vf_netdev, ndev); RCU_INIT_POINTER(net_device_ctx->vf_netdev, NULL); dev_put(vf_netdev); - return 0; + return NOTIFY_OK; } -static struct failover_ops netvsc_failover_ops = { - .slave_pre_register = netvsc_pre_register_vf, - .slave_register = netvsc_register_vf, - .slave_pre_unregister = netvsc_pre_unregister_vf, - .slave_unregister = netvsc_unregister_vf, - .slave_link_change = netvsc_vf_changed, - .slave_handle_frame = netvsc_vf_handle_frame, -}; - static int netvsc_probe(struct hv_device *dev, const struct hv_vmbus_device_id *dev_id) { @@ -2024,23 +2104,19 @@ static int netvsc_probe(struct hv_device *dev, else net->max_mtu = ETH_DATA_LEN; - ret = register_netdev(net); + rtnl_lock(); + ret = register_netdevice(net); if (ret != 0) { pr_err("Unable to register netdev.\n"); goto register_failed; } - net_device_ctx->failover = failover_register(net, &netvsc_failover_ops); - if (IS_ERR(net_device_ctx->failover)) { - ret = PTR_ERR(net_device_ctx->failover); - goto err_failover; - } - - return ret; + list_add(&net_device_ctx->list, &netvsc_dev_list); + rtnl_unlock(); + return 0; -err_failover: - unregister_netdev(net); register_failed: + rtnl_unlock(); rndis_filter_device_remove(dev, nvdev); rndis_failed: free_percpu(net_device_ctx->vf_stats); @@ -2080,14 +2156,13 @@ static int netvsc_remove(struct hv_device *dev) rtnl_lock(); vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev); if (vf_netdev) - failover_slave_unregister(vf_netdev); + netvsc_unregister_vf(vf_netdev); if (nvdev) rndis_filter_device_remove(dev, nvdev); unregister_netdevice(net); - - failover_unregister(ndev_ctx->failover); + list_del(&ndev_ctx->list); rtnl_unlock(); rcu_read_unlock(); @@ -2115,8 +2190,54 @@ static struct hv_driver netvsc_drv = { .remove = netvsc_remove, }; +/* + * On Hyper-V, every VF interface is matched with a corresponding + * synthetic interface. The synthetic interface is presented first + * to the guest. When the corresponding VF instance is registered, + * we will take care of switching the data path. + */ +static int netvsc_netdev_event(struct notifier_block *this, + unsigned long event, void *ptr) +{ + struct net_device *event_dev = netdev_notifier_info_to_dev(ptr); + + /* Skip our own events */ + if (event_dev->netdev_ops == &device_ops) + return NOTIFY_DONE; + + /* Avoid non-Ethernet type devices */ + if (event_dev->type != ARPHRD_ETHER) + return NOTIFY_DONE; + + /* Avoid Vlan dev with same MAC registering as VF */ + if (is_vlan_dev(event_dev)) + return NOTIFY_DONE; + + /* Avoid Bonding master dev with same MAC registering as VF */ + if ((event_dev->priv_flags & IFF_BONDING) && + (event_dev->flags & IFF_MASTER)) + return NOTIFY_DONE; + + switch (event) { + case NETDEV_REGISTER: + return netvsc_register_vf(event_dev); + case NETDEV_UNREGISTER: + return netvsc_unregister_vf(event_dev); + case NETDEV_UP: + case NETDEV_DOWN: + return netvsc_vf_changed(event_dev); + default: + return NOTIFY_DONE; + } +} + +static struct notifier_block netvsc_netdev_notifier = { + .notifier_call = netvsc_netdev_event, +}; + static void __exit netvsc_drv_exit(void) { + unregister_netdevice_notifier(&netvsc_netdev_notifier); vmbus_driver_unregister(&netvsc_drv); } @@ -2135,6 +2256,7 @@ static int __init netvsc_drv_init(void) if (ret) return ret; + register_netdevice_notifier(&netvsc_netdev_notifier); return 0; } diff --git a/drivers/net/phy/mdio-gpio.c b/drivers/net/phy/mdio-gpio.c index 4e4c8daf44c3..33265747bf39 100644 --- a/drivers/net/phy/mdio-gpio.c +++ b/drivers/net/phy/mdio-gpio.c @@ -26,10 +26,7 @@ #include <linux/platform_device.h> #include <linux/mdio-bitbang.h> #include <linux/mdio-gpio.h> -#include <linux/gpio.h> #include <linux/gpio/consumer.h> - -#include <linux/of_gpio.h> #include <linux/of_mdio.h> struct mdio_gpio_info { diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 9825bfd42abc..18e819d964f1 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -3572,11 +3572,14 @@ static int __init init_mac80211_hwsim(void) hwsim_wq = alloc_workqueue("hwsim_wq", 0, 0); if (!hwsim_wq) return -ENOMEM; - rhashtable_init(&hwsim_radios_rht, &hwsim_rht_params); + + err = rhashtable_init(&hwsim_radios_rht, &hwsim_rht_params); + if (err) + goto out_free_wq; err = register_pernet_device(&hwsim_net_ops); if (err) - return err; + goto out_free_rht; err = platform_driver_register(&mac80211_hwsim_driver); if (err) @@ -3701,6 +3704,10 @@ out_unregister_driver: platform_driver_unregister(&mac80211_hwsim_driver); out_unregister_pernet: unregister_pernet_device(&hwsim_net_ops); +out_free_rht: + rhashtable_destroy(&hwsim_radios_rht); +out_free_wq: + destroy_workqueue(hwsim_wq); return err; } module_init(init_mac80211_hwsim); diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 679da1abd73c..922ce0abf5cf 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -239,7 +239,7 @@ static void rx_refill_timeout(struct timer_list *t) static int netfront_tx_slot_available(struct netfront_queue *queue) { return (queue->tx.req_prod_pvt - queue->tx.rsp_cons) < - (NET_TX_RING_SIZE - MAX_SKB_FRAGS - 2); + (NET_TX_RING_SIZE - XEN_NETIF_NR_SLOTS_MIN - 1); } static void xennet_maybe_wake_tx(struct netfront_queue *queue) @@ -790,7 +790,7 @@ static int xennet_get_responses(struct netfront_queue *queue, RING_IDX cons = queue->rx.rsp_cons; struct sk_buff *skb = xennet_get_rx_skb(queue, cons); grant_ref_t ref = xennet_get_rx_ref(queue, cons); - int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD); + int max = XEN_NETIF_NR_SLOTS_MIN + (rx->status <= RX_COPY_THRESHOLD); int slots = 1; int err = 0; unsigned long ret; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index ce8c95b6365b..a502f1af4a21 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2349,6 +2349,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type) struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL); if (!node) return NULL; + + /* Make sure all padding within the structure is initialized. */ + memset(&node->msg, 0, sizeof node->msg); node->vq = vq; node->msg.type = type; return node; diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig index d94254263ea5..591a13a59787 100644 --- a/drivers/video/fbdev/Kconfig +++ b/drivers/video/fbdev/Kconfig @@ -1437,7 +1437,7 @@ config FB_SIS_315 config FB_VIA tristate "VIA UniChrome (Pro) and Chrome9 display support" - depends on FB && PCI && X86 && GPIOLIB && I2C + depends on FB && PCI && GPIOLIB && I2C && (X86 || COMPILE_TEST) select FB_CFB_FILLRECT select FB_CFB_COPYAREA select FB_CFB_IMAGEBLIT @@ -1888,7 +1888,6 @@ config FB_W100 config FB_SH_MOBILE_LCDC tristate "SuperH Mobile LCDC framebuffer support" depends on FB && (SUPERH || ARCH_RENESAS) && HAVE_CLK - depends on FB_SH_MOBILE_MERAM || !FB_SH_MOBILE_MERAM select FB_SYS_FILLRECT select FB_SYS_COPYAREA select FB_SYS_IMAGEBLIT @@ -2253,39 +2252,6 @@ config FB_BROADSHEET and could also have been called by other names when coupled with a bridge adapter. -config FB_AUO_K190X - tristate "AUO-K190X EPD controller support" - depends on FB - select FB_SYS_FILLRECT - select FB_SYS_COPYAREA - select FB_SYS_IMAGEBLIT - select FB_SYS_FOPS - select FB_DEFERRED_IO - help - Provides support for epaper controllers from the K190X series - of AUO. These controllers can be used to drive epaper displays - from Sipix. - - This option enables the common support, shared by the individual - controller drivers. You will also have to enable the driver - for the controller type used in your device. - -config FB_AUO_K1900 - tristate "AUO-K1900 EPD controller support" - depends on FB && FB_AUO_K190X - help - This driver implements support for the AUO K1900 epd-controller. - This controller can drive Sipix epaper displays but can only do - serial updates, reducing the number of possible frames per second. - -config FB_AUO_K1901 - tristate "AUO-K1901 EPD controller support" - depends on FB && FB_AUO_K190X - help - This driver implements support for the AUO K1901 epd-controller. - This controller can drive Sipix epaper displays and supports - concurrent updates, making higher frames per second possible. - config FB_JZ4740 tristate "JZ4740 LCD framebuffer support" depends on FB && MACH_JZ4740 @@ -2346,18 +2312,6 @@ source "drivers/video/fbdev/omap/Kconfig" source "drivers/video/fbdev/omap2/Kconfig" source "drivers/video/fbdev/mmp/Kconfig" -config FB_SH_MOBILE_MERAM - tristate "SuperH Mobile MERAM read ahead support" - depends on (SUPERH || ARCH_SHMOBILE) - select GENERIC_ALLOCATOR - ---help--- - Enable MERAM support for the SuperH controller. - - This will allow for caching of the framebuffer to provide more - reliable access under heavy main memory bus traffic situations. - Up to 4 memory channels can be configured, allowing 4 RGB or - 2 YCbCr framebuffers to be configured. - config FB_SSD1307 tristate "Solomon SSD1307 framebuffer support" depends on FB && I2C diff --git a/drivers/video/fbdev/Makefile b/drivers/video/fbdev/Makefile index 55282a21b500..13c900320c2c 100644 --- a/drivers/video/fbdev/Makefile +++ b/drivers/video/fbdev/Makefile @@ -100,9 +100,6 @@ obj-$(CONFIG_FB_PMAGB_B) += pmagb-b-fb.o obj-$(CONFIG_FB_MAXINE) += maxinefb.o obj-$(CONFIG_FB_METRONOME) += metronomefb.o obj-$(CONFIG_FB_BROADSHEET) += broadsheetfb.o -obj-$(CONFIG_FB_AUO_K190X) += auo_k190x.o -obj-$(CONFIG_FB_AUO_K1900) += auo_k1900fb.o -obj-$(CONFIG_FB_AUO_K1901) += auo_k1901fb.o obj-$(CONFIG_FB_S1D13XXX) += s1d13xxxfb.o obj-$(CONFIG_FB_SH7760) += sh7760fb.o obj-$(CONFIG_FB_IMX) += imxfb.o @@ -116,7 +113,6 @@ obj-$(CONFIG_FB_SM501) += sm501fb.o obj-$(CONFIG_FB_UDL) += udlfb.o obj-$(CONFIG_FB_SMSCUFX) += smscufx.o obj-$(CONFIG_FB_XILINX) += xilinxfb.o -obj-$(CONFIG_FB_SH_MOBILE_MERAM) += sh_mobile_meram.o obj-$(CONFIG_FB_SH_MOBILE_LCDC) += sh_mobile_lcdcfb.o obj-$(CONFIG_FB_OMAP) += omap/ obj-y += omap2/ diff --git a/drivers/video/fbdev/aty/aty128fb.c b/drivers/video/fbdev/aty/aty128fb.c index 09b0e558dce8..6cc46867ff57 100644 --- a/drivers/video/fbdev/aty/aty128fb.c +++ b/drivers/video/fbdev/aty/aty128fb.c @@ -2442,7 +2442,7 @@ static void aty128_set_suspend(struct aty128fb_par *par, int suspend) (void)aty_ld_pll(POWER_MANAGEMENT); aty_st_le32(BUS_CNTL1, 0x00000010); aty_st_le32(MEM_POWER_MISC, 0x0c830000); - mdelay(100); + msleep(100); /* Switch PCI power management to D2 */ pci_set_power_state(pdev, PCI_D2); diff --git a/drivers/video/fbdev/aty/radeon_pm.c b/drivers/video/fbdev/aty/radeon_pm.c index 7137c12cbcee..e695adb0e573 100644 --- a/drivers/video/fbdev/aty/radeon_pm.c +++ b/drivers/video/fbdev/aty/radeon_pm.c @@ -2678,17 +2678,17 @@ int radeonfb_pci_suspend(struct pci_dev *pdev, pm_message_t mesg) * it, we'll restore the dynamic clocks state on wakeup */ radeon_pm_disable_dynamic_mode(rinfo); - mdelay(50); + msleep(50); radeon_pm_save_regs(rinfo, 1); if (rinfo->is_mobility && !(rinfo->pm_mode & radeon_pm_d2)) { /* Switch off LVDS interface */ - mdelay(1); + usleep_range(1000, 2000); OUTREG(LVDS_GEN_CNTL, INREG(LVDS_GEN_CNTL) & ~(LVDS_BL_MOD_EN)); - mdelay(1); + usleep_range(1000, 2000); OUTREG(LVDS_GEN_CNTL, INREG(LVDS_GEN_CNTL) & ~(LVDS_EN | LVDS_ON)); OUTREG(LVDS_PLL_CNTL, (INREG(LVDS_PLL_CNTL) & ~30000) | 0x20000); - mdelay(20); + msleep(20); OUTREG(LVDS_GEN_CNTL, INREG(LVDS_GEN_CNTL) & ~(LVDS_DIGON)); } pci_disable_device(pdev); diff --git a/drivers/video/fbdev/au1100fb.c b/drivers/video/fbdev/au1100fb.c index d555a78df5c6..0adf0683cf08 100644 --- a/drivers/video/fbdev/au1100fb.c +++ b/drivers/video/fbdev/au1100fb.c @@ -464,7 +464,7 @@ static int au1100fb_drv_probe(struct platform_device *dev) PAGE_ALIGN(fbdev->fb_len), &fbdev->fb_phys, GFP_KERNEL); if (!fbdev->fb_mem) { - print_err("fail to allocate frambuffer (size: %dK))", + print_err("fail to allocate framebuffer (size: %dK))", fbdev->fb_len / 1024); return -ENOMEM; } diff --git a/drivers/video/fbdev/au1200fb.c b/drivers/video/fbdev/au1200fb.c index 87d5a62bf6ca..3872ccef4cb2 100644 --- a/drivers/video/fbdev/au1200fb.c +++ b/drivers/video/fbdev/au1200fb.c @@ -1696,7 +1696,7 @@ static int au1200fb_drv_probe(struct platform_device *dev) &fbdev->fb_phys, GFP_KERNEL, DMA_ATTR_NON_CONSISTENT); if (!fbdev->fb_mem) { - print_err("fail to allocate frambuffer (size: %dK))", + print_err("fail to allocate framebuffer (size: %dK))", fbdev->fb_len / 1024); ret = -ENOMEM; goto failed; diff --git a/drivers/video/fbdev/auo_k1900fb.c b/drivers/video/fbdev/auo_k1900fb.c deleted file mode 100644 index 7637c60eae3d..000000000000 --- a/drivers/video/fbdev/auo_k1900fb.c +++ /dev/null @@ -1,204 +0,0 @@ -/* - * auok190xfb.c -- FB driver for AUO-K1900 controllers - * - * Copyright (C) 2011, 2012 Heiko Stuebner <heiko@sntech.de> - * - * based on broadsheetfb.c - * - * Copyright (C) 2008, Jaya Kumar - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * Layout is based on skeletonfb.c by James Simmons and Geert Uytterhoeven. - * - * This driver is written to be used with the AUO-K1900 display controller. - * - * It is intended to be architecture independent. A board specific driver - * must be used to perform all the physical IO interactions. - * - * The controller supports different update modes: - * mode0+1 16 step gray (4bit) - * mode2 4 step gray (2bit) - FIXME: add strange refresh - * mode3 2 step gray (1bit) - FIXME: add strange refresh - * mode4 handwriting mode (strange behaviour) - * mode5 automatic selection of update mode - */ - -#include <linux/module.h> -#include <linux/kernel.h> -#include <linux/errno.h> -#include <linux/string.h> -#include <linux/mm.h> -#include <linux/slab.h> -#include <linux/delay.h> -#include <linux/interrupt.h> -#include <linux/fb.h> -#include <linux/init.h> -#include <linux/platform_device.h> -#include <linux/list.h> -#include <linux/firmware.h> -#include <linux/gpio.h> -#include <linux/pm_runtime.h> - -#include <video/auo_k190xfb.h> - -#include "auo_k190x.h" - -/* - * AUO-K1900 specific commands - */ - -#define AUOK1900_CMD_PARTIALDISP 0x1001 -#define AUOK1900_CMD_ROTATION 0x1006 -#define AUOK1900_CMD_LUT_STOP 0x1009 - -#define AUOK1900_INIT_TEMP_AVERAGE (1 << 13) -#define AUOK1900_INIT_ROTATE(_x) ((_x & 0x3) << 10) -#define AUOK1900_INIT_RESOLUTION(_res) ((_res & 0x7) << 2) - -static void auok1900_init(struct auok190xfb_par *par) -{ - struct device *dev = par->info->device; - struct auok190x_board *board = par->board; - u16 init_param = 0; - - pm_runtime_get_sync(dev); - - init_param |= AUOK1900_INIT_TEMP_AVERAGE; - init_param |= AUOK1900_INIT_ROTATE(par->rotation); - init_param |= AUOK190X_INIT_INVERSE_WHITE; - init_param |= AUOK190X_INIT_FORMAT0; - init_param |= AUOK1900_INIT_RESOLUTION(par->resolution); - init_param |= AUOK190X_INIT_SHIFT_RIGHT; - - auok190x_send_cmdargs(par, AUOK190X_CMD_INIT, 1, &init_param); - - /* let the controller finish */ - board->wait_for_rdy(par); - - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); -} - -static void auok1900_update_region(struct auok190xfb_par *par, int mode, - u16 y1, u16 y2) -{ - struct device *dev = par->info->device; - unsigned char *buf = (unsigned char *)par->info->screen_base; - int xres = par->info->var.xres; - int line_length = par->info->fix.line_length; - u16 args[4]; - - pm_runtime_get_sync(dev); - - mutex_lock(&(par->io_lock)); - - /* y1 and y2 must be a multiple of 2 so drop the lowest bit */ - y1 &= 0xfffe; - y2 &= 0xfffe; - - dev_dbg(dev, "update (x,y,w,h,mode)=(%d,%d,%d,%d,%d)\n", - 1, y1+1, xres, y2-y1, mode); - - /* to FIX handle different partial update modes */ - args[0] = mode | 1; - args[1] = y1 + 1; - args[2] = xres; - args[3] = y2 - y1; - buf += y1 * line_length; - auok190x_send_cmdargs_pixels(par, AUOK1900_CMD_PARTIALDISP, 4, args, - ((y2 - y1) * line_length)/2, (u16 *) buf); - auok190x_send_command(par, AUOK190X_CMD_DATA_STOP); - - par->update_cnt++; - - mutex_unlock(&(par->io_lock)); - - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); -} - -static void auok1900fb_dpy_update_pages(struct auok190xfb_par *par, - u16 y1, u16 y2) -{ - int mode; - - if (par->update_mode < 0) { - mode = AUOK190X_UPDATE_MODE(1); - par->last_mode = -1; - } else { - mode = AUOK190X_UPDATE_MODE(par->update_mode); - par->last_mode = par->update_mode; - } - - if (par->flash) - mode |= AUOK190X_UPDATE_NONFLASH; - - auok1900_update_region(par, mode, y1, y2); -} - -static void auok1900fb_dpy_update(struct auok190xfb_par *par) -{ - int mode; - - if (par->update_mode < 0) { - mode = AUOK190X_UPDATE_MODE(0); - par->last_mode = -1; - } else { - mode = AUOK190X_UPDATE_MODE(par->update_mode); - par->last_mode = par->update_mode; - } - - if (par->flash) - mode |= AUOK190X_UPDATE_NONFLASH; - - auok1900_update_region(par, mode, 0, par->info->var.yres); - par->update_cnt = 0; -} - -static bool auok1900fb_need_refresh(struct auok190xfb_par *par) -{ - return (par->update_cnt > 10); -} - -static int auok1900fb_probe(struct platform_device *pdev) -{ - struct auok190x_init_data init; - struct auok190x_board *board; - - /* pick up board specific routines */ - board = pdev->dev.platform_data; - if (!board) - return -EINVAL; - - /* fill temporary init struct for common init */ - init.id = "auo_k1900fb"; - init.board = board; - init.update_partial = auok1900fb_dpy_update_pages; - init.update_all = auok1900fb_dpy_update; - init.need_refresh = auok1900fb_need_refresh; - init.init = auok1900_init; - - return auok190x_common_probe(pdev, &init); -} - -static int auok1900fb_remove(struct platform_device *pdev) -{ - return auok190x_common_remove(pdev); -} - -static struct platform_driver auok1900fb_driver = { - .probe = auok1900fb_probe, - .remove = auok1900fb_remove, - .driver = { - .name = "auo_k1900fb", - .pm = &auok190x_pm, - }, -}; -module_platform_driver(auok1900fb_driver); - -MODULE_DESCRIPTION("framebuffer driver for the AUO-K1900 EPD controller"); -MODULE_AUTHOR("Heiko Stuebner <heiko@sntech.de>"); -MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/auo_k1901fb.c b/drivers/video/fbdev/auo_k1901fb.c deleted file mode 100644 index 681fe61957b6..000000000000 --- a/drivers/video/fbdev/auo_k1901fb.c +++ /dev/null @@ -1,257 +0,0 @@ -/* - * auok190xfb.c -- FB driver for AUO-K1901 controllers - * - * Copyright (C) 2011, 2012 Heiko Stuebner <heiko@sntech.de> - * - * based on broadsheetfb.c - * - * Copyright (C) 2008, Jaya Kumar - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * Layout is based on skeletonfb.c by James Simmons and Geert Uytterhoeven. - * - * This driver is written to be used with the AUO-K1901 display controller. - * - * It is intended to be architecture independent. A board specific driver - * must be used to perform all the physical IO interactions. - * - * The controller supports different update modes: - * mode0+1 16 step gray (4bit) - * mode2+3 4 step gray (2bit) - * mode4+5 2 step gray (1bit) - * - mode4 is described as "without LUT" - * mode7 automatic selection of update mode - * - * The most interesting difference to the K1900 is the ability to do screen - * updates in an asynchronous fashion. Where the K1900 needs to wait for the - * current update to complete, the K1901 can process later updates already. - */ - -#include <linux/module.h> -#include <linux/kernel.h> -#include <linux/errno.h> -#include <linux/string.h> -#include <linux/mm.h> -#include <linux/slab.h> -#include <linux/delay.h> -#include <linux/interrupt.h> -#include <linux/fb.h> -#include <linux/init.h> -#include <linux/platform_device.h> -#include <linux/list.h> -#include <linux/firmware.h> -#include <linux/gpio.h> -#include <linux/pm_runtime.h> - -#include <video/auo_k190xfb.h> - -#include "auo_k190x.h" - -/* - * AUO-K1901 specific commands - */ - -#define AUOK1901_CMD_LUT_INTERFACE 0x0005 -#define AUOK1901_CMD_DMA_START 0x1001 -#define AUOK1901_CMD_CURSOR_START 0x1007 -#define AUOK1901_CMD_CURSOR_STOP AUOK190X_CMD_DATA_STOP -#define AUOK1901_CMD_DDMA_START 0x1009 - -#define AUOK1901_INIT_GATE_PULSE_LOW (0 << 14) -#define AUOK1901_INIT_GATE_PULSE_HIGH (1 << 14) -#define AUOK1901_INIT_SINGLE_GATE (0 << 13) -#define AUOK1901_INIT_DOUBLE_GATE (1 << 13) - -/* Bits to pixels - * Mode 15-12 11-8 7-4 3-0 - * format2 2 T 1 T - * format3 1 T 2 T - * format4 T 2 T 1 - * format5 T 1 T 2 - * - * halftone modes: - * format6 2 2 1 1 - * format7 1 1 2 2 - */ -#define AUOK1901_INIT_FORMAT2 (1 << 7) -#define AUOK1901_INIT_FORMAT3 ((1 << 7) | (1 << 6)) -#define AUOK1901_INIT_FORMAT4 (1 << 8) -#define AUOK1901_INIT_FORMAT5 ((1 << 8) | (1 << 6)) -#define AUOK1901_INIT_FORMAT6 ((1 << 8) | (1 << 7)) -#define AUOK1901_INIT_FORMAT7 ((1 << 8) | (1 << 7) | (1 << 6)) - -/* res[4] to bit 10 - * res[3-0] to bits 5-2 - */ -#define AUOK1901_INIT_RESOLUTION(_res) (((_res & (1 << 4)) << 6) \ - | ((_res & 0xf) << 2)) - -/* - * portrait / landscape orientation in AUOK1901_CMD_DMA_START - */ -#define AUOK1901_DMA_ROTATE90(_rot) ((_rot & 1) << 13) - -/* - * equivalent to 1 << 11, needs the ~ to have same rotation like K1900 - */ -#define AUOK1901_DDMA_ROTATE180(_rot) ((~_rot & 2) << 10) - -static void auok1901_init(struct auok190xfb_par *par) -{ - struct device *dev = par->info->device; - struct auok190x_board *board = par->board; - u16 init_param = 0; - - pm_runtime_get_sync(dev); - - init_param |= AUOK190X_INIT_INVERSE_WHITE; - init_param |= AUOK190X_INIT_FORMAT0; - init_param |= AUOK1901_INIT_RESOLUTION(par->resolution); - init_param |= AUOK190X_INIT_SHIFT_LEFT; - - auok190x_send_cmdargs(par, AUOK190X_CMD_INIT, 1, &init_param); - - /* let the controller finish */ - board->wait_for_rdy(par); - - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); -} - -static void auok1901_update_region(struct auok190xfb_par *par, int mode, - u16 y1, u16 y2) -{ - struct device *dev = par->info->device; - unsigned char *buf = (unsigned char *)par->info->screen_base; - int xres = par->info->var.xres; - int line_length = par->info->fix.line_length; - u16 args[5]; - - pm_runtime_get_sync(dev); - - mutex_lock(&(par->io_lock)); - - /* y1 and y2 must be a multiple of 2 so drop the lowest bit */ - y1 &= 0xfffe; - y2 &= 0xfffe; - - dev_dbg(dev, "update (x,y,w,h,mode)=(%d,%d,%d,%d,%d)\n", - 1, y1+1, xres, y2-y1, mode); - - /* K1901: first transfer the region data */ - args[0] = AUOK1901_DMA_ROTATE90(par->rotation) | 1; - args[1] = y1 + 1; - args[2] = xres; - args[3] = y2 - y1; - buf += y1 * line_length; - auok190x_send_cmdargs_pixels_nowait(par, AUOK1901_CMD_DMA_START, 4, - args, ((y2 - y1) * line_length)/2, - (u16 *) buf); - auok190x_send_command_nowait(par, AUOK190X_CMD_DATA_STOP); - - /* K1901: second tell the controller to update the region with mode */ - args[0] = mode | AUOK1901_DDMA_ROTATE180(par->rotation); - args[1] = 1; - args[2] = y1 + 1; - args[3] = xres; - args[4] = y2 - y1; - auok190x_send_cmdargs_nowait(par, AUOK1901_CMD_DDMA_START, 5, args); - - par->update_cnt++; - - mutex_unlock(&(par->io_lock)); - - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); -} - -static void auok1901fb_dpy_update_pages(struct auok190xfb_par *par, - u16 y1, u16 y2) -{ - int mode; - - if (par->update_mode < 0) { - mode = AUOK190X_UPDATE_MODE(1); - par->last_mode = -1; - } else { - mode = AUOK190X_UPDATE_MODE(par->update_mode); - par->last_mode = par->update_mode; - } - - if (par->flash) - mode |= AUOK190X_UPDATE_NONFLASH; - - auok1901_update_region(par, mode, y1, y2); -} - -static void auok1901fb_dpy_update(struct auok190xfb_par *par) -{ - int mode; - - /* When doing full updates, wait for the controller to be ready - * This will hopefully catch some hangs of the K1901 - */ - par->board->wait_for_rdy(par); - - if (par->update_mode < 0) { - mode = AUOK190X_UPDATE_MODE(0); - par->last_mode = -1; - } else { - mode = AUOK190X_UPDATE_MODE(par->update_mode); - par->last_mode = par->update_mode; - } - - if (par->flash) - mode |= AUOK190X_UPDATE_NONFLASH; - - auok1901_update_region(par, mode, 0, par->info->var.yres); - par->update_cnt = 0; -} - -static bool auok1901fb_need_refresh(struct auok190xfb_par *par) -{ - return (par->update_cnt > 10); -} - -static int auok1901fb_probe(struct platform_device *pdev) -{ - struct auok190x_init_data init; - struct auok190x_board *board; - - /* pick up board specific routines */ - board = pdev->dev.platform_data; - if (!board) - return -EINVAL; - - /* fill temporary init struct for common init */ - init.id = "auo_k1901fb"; - init.board = board; - init.update_partial = auok1901fb_dpy_update_pages; - init.update_all = auok1901fb_dpy_update; - init.need_refresh = auok1901fb_need_refresh; - init.init = auok1901_init; - - return auok190x_common_probe(pdev, &init); -} - -static int auok1901fb_remove(struct platform_device *pdev) -{ - return auok190x_common_remove(pdev); -} - -static struct platform_driver auok1901fb_driver = { - .probe = auok1901fb_probe, - .remove = auok1901fb_remove, - .driver = { - .name = "auo_k1901fb", - .pm = &auok190x_pm, - }, -}; -module_platform_driver(auok1901fb_driver); - -MODULE_DESCRIPTION("framebuffer driver for the AUO-K1901 EPD controller"); -MODULE_AUTHOR("Heiko Stuebner <heiko@sntech.de>"); -MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/auo_k190x.c b/drivers/video/fbdev/auo_k190x.c deleted file mode 100644 index 9d24d1b3e9ef..000000000000 --- a/drivers/video/fbdev/auo_k190x.c +++ /dev/null @@ -1,1195 +0,0 @@ -/* - * Common code for AUO-K190X framebuffer drivers - * - * Copyright (C) 2012 Heiko Stuebner <heiko@sntech.de> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include <linux/module.h> -#include <linux/sched/mm.h> -#include <linux/kernel.h> -#include <linux/gpio.h> -#include <linux/platform_device.h> -#include <linux/pm_runtime.h> -#include <linux/fb.h> -#include <linux/delay.h> -#include <linux/uaccess.h> -#include <linux/vmalloc.h> -#include <linux/regulator/consumer.h> - -#include <video/auo_k190xfb.h> - -#include "auo_k190x.h" - -struct panel_info { - int w; - int h; -}; - -/* table of panel specific parameters to be indexed into by the board drivers */ -static struct panel_info panel_table[] = { - /* standard 6" */ - [AUOK190X_RESOLUTION_800_600] = { - .w = 800, - .h = 600, - }, - /* standard 9" */ - [AUOK190X_RESOLUTION_1024_768] = { - .w = 1024, - .h = 768, - }, - [AUOK190X_RESOLUTION_600_800] = { - .w = 600, - .h = 800, - }, - [AUOK190X_RESOLUTION_768_1024] = { - .w = 768, - .h = 1024, - }, -}; - -/* - * private I80 interface to the board driver - */ - -static void auok190x_issue_data(struct auok190xfb_par *par, u16 data) -{ - par->board->set_ctl(par, AUOK190X_I80_WR, 0); - par->board->set_hdb(par, data); - par->board->set_ctl(par, AUOK190X_I80_WR, 1); -} - -static void auok190x_issue_cmd(struct auok190xfb_par *par, u16 data) -{ - par->board->set_ctl(par, AUOK190X_I80_DC, 0); - auok190x_issue_data(par, data); - par->board->set_ctl(par, AUOK190X_I80_DC, 1); -} - -/** - * Conversion of 16bit color to 4bit grayscale - * does roughly (0.3 * R + 0.6 G + 0.1 B) / 2 - */ -static inline int rgb565_to_gray4(u16 data, struct fb_var_screeninfo *var) -{ - return ((((data & 0xF800) >> var->red.offset) * 77 + - ((data & 0x07E0) >> (var->green.offset + 1)) * 151 + - ((data & 0x1F) >> var->blue.offset) * 28) >> 8 >> 1); -} - -static int auok190x_issue_pixels_rgb565(struct auok190xfb_par *par, int size, - u16 *data) -{ - struct fb_var_screeninfo *var = &par->info->var; - struct device *dev = par->info->device; - int i; - u16 tmp; - - if (size & 7) { - dev_err(dev, "issue_pixels: size %d must be a multiple of 8\n", - size); - return -EINVAL; - } - - for (i = 0; i < (size >> 2); i++) { - par->board->set_ctl(par, AUOK190X_I80_WR, 0); - - tmp = (rgb565_to_gray4(data[4*i], var) & 0x000F); - tmp |= (rgb565_to_gray4(data[4*i+1], var) << 4) & 0x00F0; - tmp |= (rgb565_to_gray4(data[4*i+2], var) << 8) & 0x0F00; - tmp |= (rgb565_to_gray4(data[4*i+3], var) << 12) & 0xF000; - - par->board->set_hdb(par, tmp); - par->board->set_ctl(par, AUOK190X_I80_WR, 1); - } - - return 0; -} - -static int auok190x_issue_pixels_gray8(struct auok190xfb_par *par, int size, - u16 *data) -{ - struct device *dev = par->info->device; - int i; - u16 tmp; - - if (size & 3) { - dev_err(dev, "issue_pixels: size %d must be a multiple of 4\n", - size); - return -EINVAL; - } - - for (i = 0; i < (size >> 1); i++) { - par->board->set_ctl(par, AUOK190X_I80_WR, 0); - - /* simple reduction of 8bit staticgray to 4bit gray - * combines 4 * 4bit pixel values into a 16bit value - */ - tmp = (data[2*i] & 0xF0) >> 4; - tmp |= (data[2*i] & 0xF000) >> 8; - tmp |= (data[2*i+1] & 0xF0) << 4; - tmp |= (data[2*i+1] & 0xF000); - - par->board->set_hdb(par, tmp); - par->board->set_ctl(par, AUOK190X_I80_WR, 1); - } - - return 0; -} - -static int auok190x_issue_pixels(struct auok190xfb_par *par, int size, - u16 *data) -{ - struct fb_info *info = par->info; - struct device *dev = par->info->device; - - if (info->var.bits_per_pixel == 8 && info->var.grayscale) - auok190x_issue_pixels_gray8(par, size, data); - else if (info->var.bits_per_pixel == 16) - auok190x_issue_pixels_rgb565(par, size, data); - else - dev_err(dev, "unsupported color mode (bits: %d, gray: %d)\n", - info->var.bits_per_pixel, info->var.grayscale); - - return 0; -} - -static u16 auok190x_read_data(struct auok190xfb_par *par) -{ - u16 data; - - par->board->set_ctl(par, AUOK190X_I80_OE, 0); - data = par->board->get_hdb(par); - par->board->set_ctl(par, AUOK190X_I80_OE, 1); - - return data; -} - -/* - * Command interface for the controller drivers - */ - -void auok190x_send_command_nowait(struct auok190xfb_par *par, u16 data) -{ - par->board->set_ctl(par, AUOK190X_I80_CS, 0); - auok190x_issue_cmd(par, data); - par->board->set_ctl(par, AUOK190X_I80_CS, 1); -} -EXPORT_SYMBOL_GPL(auok190x_send_command_nowait); - -void auok190x_send_cmdargs_nowait(struct auok190xfb_par *par, u16 cmd, - int argc, u16 *argv) -{ - int i; - - par->board->set_ctl(par, AUOK190X_I80_CS, 0); - auok190x_issue_cmd(par, cmd); - - for (i = 0; i < argc; i++) - auok190x_issue_data(par, argv[i]); - par->board->set_ctl(par, AUOK190X_I80_CS, 1); -} -EXPORT_SYMBOL_GPL(auok190x_send_cmdargs_nowait); - -int auok190x_send_command(struct auok190xfb_par *par, u16 data) -{ - int ret; - - ret = par->board->wait_for_rdy(par); - if (ret) - return ret; - - auok190x_send_command_nowait(par, data); - return 0; -} -EXPORT_SYMBOL_GPL(auok190x_send_command); - -int auok190x_send_cmdargs(struct auok190xfb_par *par, u16 cmd, - int argc, u16 *argv) -{ - int ret; - - ret = par->board->wait_for_rdy(par); - if (ret) - return ret; - - auok190x_send_cmdargs_nowait(par, cmd, argc, argv); - return 0; -} -EXPORT_SYMBOL_GPL(auok190x_send_cmdargs); - -int auok190x_read_cmdargs(struct auok190xfb_par *par, u16 cmd, - int argc, u16 *argv) -{ - int i, ret; - - ret = par->board->wait_for_rdy(par); - if (ret) - return ret; - - par->board->set_ctl(par, AUOK190X_I80_CS, 0); - auok190x_issue_cmd(par, cmd); - - for (i = 0; i < argc; i++) - argv[i] = auok190x_read_data(par); - par->board->set_ctl(par, AUOK190X_I80_CS, 1); - - return 0; -} -EXPORT_SYMBOL_GPL(auok190x_read_cmdargs); - -void auok190x_send_cmdargs_pixels_nowait(struct auok190xfb_par *par, u16 cmd, - int argc, u16 *argv, int size, u16 *data) -{ - int i; - - par->board->set_ctl(par, AUOK190X_I80_CS, 0); - - auok190x_issue_cmd(par, cmd); - - for (i = 0; i < argc; i++) - auok190x_issue_data(par, argv[i]); - - auok190x_issue_pixels(par, size, data); - - par->board->set_ctl(par, AUOK190X_I80_CS, 1); -} -EXPORT_SYMBOL_GPL(auok190x_send_cmdargs_pixels_nowait); - -int auok190x_send_cmdargs_pixels(struct auok190xfb_par *par, u16 cmd, - int argc, u16 *argv, int size, u16 *data) -{ - int ret; - - ret = par->board->wait_for_rdy(par); - if (ret) - return ret; - - auok190x_send_cmdargs_pixels_nowait(par, cmd, argc, argv, size, data); - - return 0; -} -EXPORT_SYMBOL_GPL(auok190x_send_cmdargs_pixels); - -/* - * fbdefio callbacks - common on both controllers. - */ - -static void auok190xfb_dpy_first_io(struct fb_info *info) -{ - /* tell runtime-pm that we wish to use the device in a short time */ - pm_runtime_get(info->device); -} - -/* this is called back from the deferred io workqueue */ -static void auok190xfb_dpy_deferred_io(struct fb_info *info, - struct list_head *pagelist) -{ - struct fb_deferred_io *fbdefio = info->fbdefio; - struct auok190xfb_par *par = info->par; - u16 line_length = info->fix.line_length; - u16 yres = info->var.yres; - u16 y1 = 0, h = 0; - int prev_index = -1; - struct page *cur; - int h_inc; - int threshold; - - if (!list_empty(pagelist)) - /* the device resume should've been requested through first_io, - * if the resume did not finish until now, wait for it. - */ - pm_runtime_barrier(info->device); - else - /* We reached this via the fsync or some other way. - * In either case the first_io function did not run, - * so we runtime_resume the device here synchronously. - */ - pm_runtime_get_sync(info->device); - - /* Do a full screen update every n updates to prevent - * excessive darkening of the Sipix display. - * If we do this, there is no need to walk the pages. - */ - if (par->need_refresh(par)) { - par->update_all(par); - goto out; - } - - /* height increment is fixed per page */ - h_inc = DIV_ROUND_UP(PAGE_SIZE , line_length); - - /* calculate number of pages from pixel height */ - threshold = par->consecutive_threshold / h_inc; - if (threshold < 1) - threshold = 1; - - /* walk the written page list and swizzle the data */ - list_for_each_entry(cur, &fbdefio->pagelist, lru) { - if (prev_index < 0) { - /* just starting so assign first page */ - y1 = (cur->index << PAGE_SHIFT) / line_length; - h = h_inc; - } else if ((cur->index - prev_index) <= threshold) { - /* page is within our threshold for single updates */ - h += h_inc * (cur->index - prev_index); - } else { - /* page not consecutive, issue previous update first */ - par->update_partial(par, y1, y1 + h); - - /* start over with our non consecutive page */ - y1 = (cur->index << PAGE_SHIFT) / line_length; - h = h_inc; - } - prev_index = cur->index; - } - - /* if we still have any pages to update we do so now */ - if (h >= yres) - /* its a full screen update, just do it */ - par->update_all(par); - else - par->update_partial(par, y1, min((u16) (y1 + h), yres)); - -out: - pm_runtime_mark_last_busy(info->device); - pm_runtime_put_autosuspend(info->device); -} - -/* - * framebuffer operations - */ - -/* - * this is the slow path from userspace. they can seek and write to - * the fb. it's inefficient to do anything less than a full screen draw - */ -static ssize_t auok190xfb_write(struct fb_info *info, const char __user *buf, - size_t count, loff_t *ppos) -{ - struct auok190xfb_par *par = info->par; - unsigned long p = *ppos; - void *dst; - int err = 0; - unsigned long total_size; - - if (info->state != FBINFO_STATE_RUNNING) - return -EPERM; - - total_size = info->fix.smem_len; - - if (p > total_size) - return -EFBIG; - - if (count > total_size) { - err = -EFBIG; - count = total_size; - } - - if (count + p > total_size) { - if (!err) - err = -ENOSPC; - - count = total_size - p; - } - - dst = (void *)(info->screen_base + p); - - if (copy_from_user(dst, buf, count)) - err = -EFAULT; - - if (!err) - *ppos += count; - - par->update_all(par); - - return (err) ? err : count; -} - -static void auok190xfb_fillrect(struct fb_info *info, - const struct fb_fillrect *rect) -{ - struct auok190xfb_par *par = info->par; - - sys_fillrect(info, rect); - - par->update_all(par); -} - -static void auok190xfb_copyarea(struct fb_info *info, - const struct fb_copyarea *area) -{ - struct auok190xfb_par *par = info->par; - - sys_copyarea(info, area); - - par->update_all(par); -} - -static void auok190xfb_imageblit(struct fb_info *info, - const struct fb_image *image) -{ - struct auok190xfb_par *par = info->par; - - sys_imageblit(info, image); - - par->update_all(par); -} - -static int auok190xfb_check_var(struct fb_var_screeninfo *var, - struct fb_info *info) -{ - struct device *dev = info->device; - struct auok190xfb_par *par = info->par; - struct panel_info *panel = &panel_table[par->resolution]; - int size; - - /* - * Color depth - */ - - if (var->bits_per_pixel == 8 && var->grayscale == 1) { - /* - * For 8-bit grayscale, R, G, and B offset are equal. - */ - var->red.length = 8; - var->red.offset = 0; - var->red.msb_right = 0; - - var->green.length = 8; - var->green.offset = 0; - var->green.msb_right = 0; - - var->blue.length = 8; - var->blue.offset = 0; - var->blue.msb_right = 0; - - var->transp.length = 0; - var->transp.offset = 0; - var->transp.msb_right = 0; - } else if (var->bits_per_pixel == 16) { - var->red.length = 5; - var->red.offset = 11; - var->red.msb_right = 0; - - var->green.length = 6; - var->green.offset = 5; - var->green.msb_right = 0; - - var->blue.length = 5; - var->blue.offset = 0; - var->blue.msb_right = 0; - - var->transp.length = 0; - var->transp.offset = 0; - var->transp.msb_right = 0; - } else { - dev_warn(dev, "unsupported color mode (bits: %d, grayscale: %d)\n", - info->var.bits_per_pixel, info->var.grayscale); - return -EINVAL; - } - - /* - * Dimensions - */ - - switch (var->rotate) { - case FB_ROTATE_UR: - case FB_ROTATE_UD: - var->xres = panel->w; - var->yres = panel->h; - break; - case FB_ROTATE_CW: - case FB_ROTATE_CCW: - var->xres = panel->h; - var->yres = panel->w; - break; - default: - dev_dbg(dev, "Invalid rotation request\n"); - return -EINVAL; - } - - var->xres_virtual = var->xres; - var->yres_virtual = var->yres; - - /* - * Memory limit - */ - - size = var->xres_virtual * var->yres_virtual * var->bits_per_pixel / 8; - if (size > info->fix.smem_len) { - dev_err(dev, "Memory limit exceeded, requested %dK\n", - size >> 10); - return -ENOMEM; - } - - return 0; -} - -static int auok190xfb_set_fix(struct fb_info *info) -{ - struct fb_fix_screeninfo *fix = &info->fix; - struct fb_var_screeninfo *var = &info->var; - - fix->line_length = var->xres_virtual * var->bits_per_pixel / 8; - - fix->type = FB_TYPE_PACKED_PIXELS; - fix->accel = FB_ACCEL_NONE; - fix->visual = (var->grayscale) ? FB_VISUAL_STATIC_PSEUDOCOLOR - : FB_VISUAL_TRUECOLOR; - fix->xpanstep = 0; - fix->ypanstep = 0; - fix->ywrapstep = 0; - - return 0; -} - -static int auok190xfb_set_par(struct fb_info *info) -{ - struct auok190xfb_par *par = info->par; - - par->rotation = info->var.rotate; - auok190xfb_set_fix(info); - - /* reinit the controller to honor the rotation */ - par->init(par); - - /* wait for init to complete */ - par->board->wait_for_rdy(par); - - return 0; -} - -static struct fb_ops auok190xfb_ops = { - .owner = THIS_MODULE, - .fb_read = fb_sys_read, - .fb_write = auok190xfb_write, - .fb_fillrect = auok190xfb_fillrect, - .fb_copyarea = auok190xfb_copyarea, - .fb_imageblit = auok190xfb_imageblit, - .fb_check_var = auok190xfb_check_var, - .fb_set_par = auok190xfb_set_par, -}; - -/* - * Controller-functions common to both K1900 and K1901 - */ - -static int auok190x_read_temperature(struct auok190xfb_par *par) -{ - struct device *dev = par->info->device; - u16 data[4]; - int temp; - - pm_runtime_get_sync(dev); - - mutex_lock(&(par->io_lock)); - - auok190x_read_cmdargs(par, AUOK190X_CMD_READ_VERSION, 4, data); - - mutex_unlock(&(par->io_lock)); - - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - /* sanitize and split of half-degrees for now */ - temp = ((data[0] & AUOK190X_VERSION_TEMP_MASK) >> 1); - - /* handle positive and negative temperatures */ - if (temp >= 201) - return (255 - temp + 1) * (-1); - else - return temp; -} - -static void auok190x_identify(struct auok190xfb_par *par) -{ - struct device *dev = par->info->device; - u16 data[4]; - - pm_runtime_get_sync(dev); - - mutex_lock(&(par->io_lock)); - - auok190x_read_cmdargs(par, AUOK190X_CMD_READ_VERSION, 4, data); - - mutex_unlock(&(par->io_lock)); - - par->epd_type = data[1] & AUOK190X_VERSION_TEMP_MASK; - - par->panel_size_int = AUOK190X_VERSION_SIZE_INT(data[2]); - par->panel_size_float = AUOK190X_VERSION_SIZE_FLOAT(data[2]); - par->panel_model = AUOK190X_VERSION_MODEL(data[2]); - - par->tcon_version = AUOK190X_VERSION_TCON(data[3]); - par->lut_version = AUOK190X_VERSION_LUT(data[3]); - - dev_dbg(dev, "panel %d.%din, model 0x%x, EPD 0x%x TCON-rev 0x%x, LUT-rev 0x%x", - par->panel_size_int, par->panel_size_float, par->panel_model, - par->epd_type, par->tcon_version, par->lut_version); - - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); -} - -/* - * Sysfs functions - */ - -static ssize_t update_mode_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct fb_info *info = dev_get_drvdata(dev); - struct auok190xfb_par *par = info->par; - - return sprintf(buf, "%d\n", par->update_mode); -} - -static ssize_t update_mode_store(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t count) -{ - struct fb_info *info = dev_get_drvdata(dev); - struct auok190xfb_par *par = info->par; - int mode, ret; - - ret = kstrtoint(buf, 10, &mode); - if (ret) - return ret; - - par->update_mode = mode; - - /* if we enter a better mode, do a full update */ - if (par->last_mode > 1 && mode < par->last_mode) - par->update_all(par); - - return count; -} - -static ssize_t flash_show(struct device *dev, struct device_attribute *attr, - char *buf) -{ - struct fb_info *info = dev_get_drvdata(dev); - struct auok190xfb_par *par = info->par; - - return sprintf(buf, "%d\n", par->flash); -} - -static ssize_t flash_store(struct device *dev, struct device_attribute *attr, - const char *buf, size_t count) -{ - struct fb_info *info = dev_get_drvdata(dev); - struct auok190xfb_par *par = info->par; - int flash, ret; - - ret = kstrtoint(buf, 10, &flash); - if (ret) - return ret; - - if (flash > 0) - par->flash = 1; - else - par->flash = 0; - - return count; -} - -static ssize_t temp_show(struct device *dev, struct device_attribute *attr, - char *buf) -{ - struct fb_info *info = dev_get_drvdata(dev); - struct auok190xfb_par *par = info->par; - int temp; - - temp = auok190x_read_temperature(par); - return sprintf(buf, "%d\n", temp); -} - -static DEVICE_ATTR_RW(update_mode); -static DEVICE_ATTR_RW(flash); -static DEVICE_ATTR(temp, 0644, temp_show, NULL); - -static struct attribute *auok190x_attributes[] = { - &dev_attr_update_mode.attr, - &dev_attr_flash.attr, - &dev_attr_temp.attr, - NULL -}; - -static const struct attribute_group auok190x_attr_group = { - .attrs = auok190x_attributes, -}; - -static int auok190x_power(struct auok190xfb_par *par, bool on) -{ - struct auok190x_board *board = par->board; - int ret; - - if (on) { - /* We should maintain POWER up for at least 80ms before set - * RST_N and SLP_N to high (TCON spec 20100803_v35 p59) - */ - ret = regulator_enable(par->regulator); - if (ret) - return ret; - - msleep(200); - gpio_set_value(board->gpio_nrst, 1); - gpio_set_value(board->gpio_nsleep, 1); - msleep(200); - } else { - regulator_disable(par->regulator); - gpio_set_value(board->gpio_nrst, 0); - gpio_set_value(board->gpio_nsleep, 0); - } - - return 0; -} - -/* - * Recovery - powercycle the controller - */ - -static void auok190x_recover(struct auok190xfb_par *par) -{ - struct device *dev = par->info->device; - - auok190x_power(par, 0); - msleep(100); - auok190x_power(par, 1); - - /* after powercycling the device, it's always active */ - pm_runtime_set_active(dev); - par->standby = 0; - - par->init(par); - - /* wait for init to complete */ - par->board->wait_for_rdy(par); -} - -/* - * Power-management - */ -static int __maybe_unused auok190x_runtime_suspend(struct device *dev) -{ - struct platform_device *pdev = to_platform_device(dev); - struct fb_info *info = platform_get_drvdata(pdev); - struct auok190xfb_par *par = info->par; - struct auok190x_board *board = par->board; - u16 standby_param; - - /* take and keep the lock until we are resumed, as the controller - * will never reach the non-busy state when in standby mode - */ - mutex_lock(&(par->io_lock)); - - if (par->standby) { - dev_warn(dev, "already in standby, runtime-pm pairing mismatch\n"); - mutex_unlock(&(par->io_lock)); - return 0; - } - - /* according to runtime_pm.txt runtime_suspend only means, that the - * device will not process data and will not communicate with the CPU - * As we hold the lock, this stays true even without standby - */ - if (board->quirks & AUOK190X_QUIRK_STANDBYBROKEN) { - dev_dbg(dev, "runtime suspend without standby\n"); - goto finish; - } else if (board->quirks & AUOK190X_QUIRK_STANDBYPARAM) { - /* for some TCON versions STANDBY expects a parameter (0) but - * it seems the real tcon version has to be determined yet. - */ - dev_dbg(dev, "runtime suspend with additional empty param\n"); - standby_param = 0; - auok190x_send_cmdargs(par, AUOK190X_CMD_STANDBY, 1, - &standby_param); - } else { - dev_dbg(dev, "runtime suspend without param\n"); - auok190x_send_command(par, AUOK190X_CMD_STANDBY); - } - - msleep(64); - -finish: - par->standby = 1; - - return 0; -} - -static int __maybe_unused auok190x_runtime_resume(struct device *dev) -{ - struct platform_device *pdev = to_platform_device(dev); - struct fb_info *info = platform_get_drvdata(pdev); - struct auok190xfb_par *par = info->par; - struct auok190x_board *board = par->board; - - if (!par->standby) { - dev_warn(dev, "not in standby, runtime-pm pairing mismatch\n"); - return 0; - } - - if (board->quirks & AUOK190X_QUIRK_STANDBYBROKEN) { - dev_dbg(dev, "runtime resume without standby\n"); - } else { - /* when in standby, controller is always busy - * and only accepts the wakeup command - */ - dev_dbg(dev, "runtime resume from standby\n"); - auok190x_send_command_nowait(par, AUOK190X_CMD_WAKEUP); - - msleep(160); - - /* wait for the controller to be ready and release the lock */ - board->wait_for_rdy(par); - } - - par->standby = 0; - - mutex_unlock(&(par->io_lock)); - - return 0; -} - -static int __maybe_unused auok190x_suspend(struct device *dev) -{ - struct platform_device *pdev = to_platform_device(dev); - struct fb_info *info = platform_get_drvdata(pdev); - struct auok190xfb_par *par = info->par; - struct auok190x_board *board = par->board; - int ret; - - dev_dbg(dev, "suspend\n"); - if (board->quirks & AUOK190X_QUIRK_STANDBYBROKEN) { - /* suspend via powering off the ic */ - dev_dbg(dev, "suspend with broken standby\n"); - - auok190x_power(par, 0); - } else { - dev_dbg(dev, "suspend using sleep\n"); - - /* the sleep state can only be entered from the standby state. - * pm_runtime_get_noresume gets called before the suspend call. - * So the devices usage count is >0 but it is not necessarily - * active. - */ - if (!pm_runtime_status_suspended(dev)) { - ret = auok190x_runtime_suspend(dev); - if (ret < 0) { - dev_err(dev, "auok190x_runtime_suspend failed with %d\n", - ret); - return ret; - } - par->manual_standby = 1; - } - - gpio_direction_output(board->gpio_nsleep, 0); - } - - msleep(100); - - return 0; -} - -static int __maybe_unused auok190x_resume(struct device *dev) -{ - struct platform_device *pdev = to_platform_device(dev); - struct fb_info *info = platform_get_drvdata(pdev); - struct auok190xfb_par *par = info->par; - struct auok190x_board *board = par->board; - - dev_dbg(dev, "resume\n"); - if (board->quirks & AUOK190X_QUIRK_STANDBYBROKEN) { - dev_dbg(dev, "resume with broken standby\n"); - - auok190x_power(par, 1); - - par->init(par); - } else { - dev_dbg(dev, "resume from sleep\n"); - - /* device should be in runtime suspend when we were suspended - * and pm_runtime_put_sync gets called after this function. - * So there is no need to touch the standby mode here at all. - */ - gpio_direction_output(board->gpio_nsleep, 1); - msleep(100); - - /* an additional init call seems to be necessary after sleep */ - auok190x_runtime_resume(dev); - par->init(par); - - /* if we were runtime-suspended before, suspend again*/ - if (!par->manual_standby) - auok190x_runtime_suspend(dev); - else - par->manual_standby = 0; - } - - return 0; -} - -const struct dev_pm_ops auok190x_pm = { - SET_RUNTIME_PM_OPS(auok190x_runtime_suspend, auok190x_runtime_resume, - NULL) - SET_SYSTEM_SLEEP_PM_OPS(auok190x_suspend, auok190x_resume) -}; -EXPORT_SYMBOL_GPL(auok190x_pm); - -/* - * Common probe and remove code - */ - -int auok190x_common_probe(struct platform_device *pdev, - struct auok190x_init_data *init) -{ - struct auok190x_board *board = init->board; - struct auok190xfb_par *par; - struct fb_info *info; - struct panel_info *panel; - int videomemorysize, ret; - unsigned char *videomemory; - - /* check board contents */ - if (!board->init || !board->cleanup || !board->wait_for_rdy - || !board->set_ctl || !board->set_hdb || !board->get_hdb - || !board->setup_irq) - return -EINVAL; - - info = framebuffer_alloc(sizeof(struct auok190xfb_par), &pdev->dev); - if (!info) - return -ENOMEM; - - par = info->par; - par->info = info; - par->board = board; - par->recover = auok190x_recover; - par->update_partial = init->update_partial; - par->update_all = init->update_all; - par->need_refresh = init->need_refresh; - par->init = init->init; - - /* init update modes */ - par->update_cnt = 0; - par->update_mode = -1; - par->last_mode = -1; - par->flash = 0; - - par->regulator = regulator_get(info->device, "vdd"); - if (IS_ERR(par->regulator)) { - ret = PTR_ERR(par->regulator); - dev_err(info->device, "Failed to get regulator: %d\n", ret); - goto err_reg; - } - - ret = board->init(par); - if (ret) { - dev_err(info->device, "board init failed, %d\n", ret); - goto err_board; - } - - ret = gpio_request(board->gpio_nsleep, "AUOK190x sleep"); - if (ret) { - dev_err(info->device, "could not request sleep gpio, %d\n", - ret); - goto err_gpio1; - } - - ret = gpio_direction_output(board->gpio_nsleep, 0); - if (ret) { - dev_err(info->device, "could not set sleep gpio, %d\n", ret); - goto err_gpio2; - } - - ret = gpio_request(board->gpio_nrst, "AUOK190x reset"); - if (ret) { - dev_err(info->device, "could not request reset gpio, %d\n", - ret); - goto err_gpio2; - } - - ret = gpio_direction_output(board->gpio_nrst, 0); - if (ret) { - dev_err(info->device, "could not set reset gpio, %d\n", ret); - goto err_gpio3; - } - - ret = auok190x_power(par, 1); - if (ret) { - dev_err(info->device, "could not power on the device, %d\n", - ret); - goto err_gpio3; - } - - mutex_init(&par->io_lock); - - init_waitqueue_head(&par->waitq); - - ret = par->board->setup_irq(par->info); - if (ret) { - dev_err(info->device, "could not setup ready-irq, %d\n", ret); - goto err_irq; - } - - /* wait for init to complete */ - par->board->wait_for_rdy(par); - - /* - * From here on the controller can talk to us - */ - - /* initialise fix, var, resolution and rotation */ - - strlcpy(info->fix.id, init->id, 16); - info->var.bits_per_pixel = 8; - info->var.grayscale = 1; - - panel = &panel_table[board->resolution]; - - par->resolution = board->resolution; - par->rotation = 0; - - /* videomemory handling */ - - videomemorysize = roundup((panel->w * panel->h) * 2, PAGE_SIZE); - videomemory = vzalloc(videomemorysize); - if (!videomemory) { - ret = -ENOMEM; - goto err_irq; - } - - info->screen_base = (char *)videomemory; - info->fix.smem_len = videomemorysize; - - info->flags = FBINFO_FLAG_DEFAULT | FBINFO_VIRTFB; - info->fbops = &auok190xfb_ops; - - ret = auok190xfb_check_var(&info->var, info); - if (ret) - goto err_defio; - - auok190xfb_set_fix(info); - - /* deferred io init */ - - info->fbdefio = devm_kzalloc(info->device, - sizeof(struct fb_deferred_io), - GFP_KERNEL); - if (!info->fbdefio) { - dev_err(info->device, "Failed to allocate memory\n"); - ret = -ENOMEM; - goto err_defio; - } - - dev_dbg(info->device, "targeting %d frames per second\n", board->fps); - info->fbdefio->delay = HZ / board->fps; - info->fbdefio->first_io = auok190xfb_dpy_first_io, - info->fbdefio->deferred_io = auok190xfb_dpy_deferred_io, - fb_deferred_io_init(info); - - /* color map */ - - ret = fb_alloc_cmap(&info->cmap, 256, 0); - if (ret < 0) { - dev_err(info->device, "Failed to allocate colormap\n"); - goto err_cmap; - } - - /* controller init */ - - par->consecutive_threshold = 100; - par->init(par); - auok190x_identify(par); - - platform_set_drvdata(pdev, info); - - ret = register_framebuffer(info); - if (ret < 0) - goto err_regfb; - - ret = sysfs_create_group(&info->device->kobj, &auok190x_attr_group); - if (ret) - goto err_sysfs; - - dev_info(info->device, "fb%d: %dx%d using %dK of video memory\n", - info->node, info->var.xres, info->var.yres, - videomemorysize >> 10); - - /* increase autosuspend_delay when we use alternative methods - * for runtime_pm - */ - par->autosuspend_delay = (board->quirks & AUOK190X_QUIRK_STANDBYBROKEN) - ? 1000 : 200; - - pm_runtime_set_active(info->device); - pm_runtime_enable(info->device); - pm_runtime_set_autosuspend_delay(info->device, par->autosuspend_delay); - pm_runtime_use_autosuspend(info->device); - - return 0; - -err_sysfs: - unregister_framebuffer(info); -err_regfb: - fb_dealloc_cmap(&info->cmap); -err_cmap: - fb_deferred_io_cleanup(info); -err_defio: - vfree((void *)info->screen_base); -err_irq: - auok190x_power(par, 0); -err_gpio3: - gpio_free(board->gpio_nrst); -err_gpio2: - gpio_free(board->gpio_nsleep); -err_gpio1: - board->cleanup(par); -err_board: - regulator_put(par->regulator); -err_reg: - framebuffer_release(info); - - return ret; -} -EXPORT_SYMBOL_GPL(auok190x_common_probe); - -int auok190x_common_remove(struct platform_device *pdev) -{ - struct fb_info *info = platform_get_drvdata(pdev); - struct auok190xfb_par *par = info->par; - struct auok190x_board *board = par->board; - - pm_runtime_disable(info->device); - - sysfs_remove_group(&info->device->kobj, &auok190x_attr_group); - - unregister_framebuffer(info); - - fb_dealloc_cmap(&info->cmap); - - fb_deferred_io_cleanup(info); - - vfree((void *)info->screen_base); - - auok190x_power(par, 0); - - gpio_free(board->gpio_nrst); - gpio_free(board->gpio_nsleep); - - board->cleanup(par); - - regulator_put(par->regulator); - - framebuffer_release(info); - - return 0; -} -EXPORT_SYMBOL_GPL(auok190x_common_remove); - -MODULE_DESCRIPTION("Common code for AUO-K190X controllers"); -MODULE_AUTHOR("Heiko Stuebner <heiko@sntech.de>"); -MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/auo_k190x.h b/drivers/video/fbdev/auo_k190x.h deleted file mode 100644 index e35af1f51b28..000000000000 --- a/drivers/video/fbdev/auo_k190x.h +++ /dev/null @@ -1,129 +0,0 @@ -/* - * Private common definitions for AUO-K190X framebuffer drivers - * - * Copyright (C) 2012 Heiko Stuebner <heiko@sntech.de> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -/* - * I80 interface specific defines - */ - -#define AUOK190X_I80_CS 0x01 -#define AUOK190X_I80_DC 0x02 -#define AUOK190X_I80_WR 0x03 -#define AUOK190X_I80_OE 0x04 - -/* - * AUOK190x commands, common to both controllers - */ - -#define AUOK190X_CMD_INIT 0x0000 -#define AUOK190X_CMD_STANDBY 0x0001 -#define AUOK190X_CMD_WAKEUP 0x0002 -#define AUOK190X_CMD_TCON_RESET 0x0003 -#define AUOK190X_CMD_DATA_STOP 0x1002 -#define AUOK190X_CMD_LUT_START 0x1003 -#define AUOK190X_CMD_DISP_REFRESH 0x1004 -#define AUOK190X_CMD_DISP_RESET 0x1005 -#define AUOK190X_CMD_PRE_DISPLAY_START 0x100D -#define AUOK190X_CMD_PRE_DISPLAY_STOP 0x100F -#define AUOK190X_CMD_FLASH_W 0x2000 -#define AUOK190X_CMD_FLASH_E 0x2001 -#define AUOK190X_CMD_FLASH_STS 0x2002 -#define AUOK190X_CMD_FRAMERATE 0x3000 -#define AUOK190X_CMD_READ_VERSION 0x4000 -#define AUOK190X_CMD_READ_STATUS 0x4001 -#define AUOK190X_CMD_READ_LUT 0x4003 -#define AUOK190X_CMD_DRIVERTIMING 0x5000 -#define AUOK190X_CMD_LBALANCE 0x5001 -#define AUOK190X_CMD_AGINGMODE 0x6000 -#define AUOK190X_CMD_AGINGEXIT 0x6001 - -/* - * Common settings for AUOK190X_CMD_INIT - */ - -#define AUOK190X_INIT_DATA_FILTER (0 << 12) -#define AUOK190X_INIT_DATA_BYPASS (1 << 12) -#define AUOK190X_INIT_INVERSE_WHITE (0 << 9) -#define AUOK190X_INIT_INVERSE_BLACK (1 << 9) -#define AUOK190X_INIT_SCAN_DOWN (0 << 1) -#define AUOK190X_INIT_SCAN_UP (1 << 1) -#define AUOK190X_INIT_SHIFT_LEFT (0 << 0) -#define AUOK190X_INIT_SHIFT_RIGHT (1 << 0) - -/* Common bits to pixels - * Mode 15-12 11-8 7-4 3-0 - * format0 4 3 2 1 - * format1 3 4 1 2 - */ - -#define AUOK190X_INIT_FORMAT0 0 -#define AUOK190X_INIT_FORMAT1 (1 << 6) - -/* - * settings for AUOK190X_CMD_RESET - */ - -#define AUOK190X_RESET_TCON (0 << 0) -#define AUOK190X_RESET_NORMAL (1 << 0) -#define AUOK190X_RESET_PON (1 << 1) - -/* - * AUOK190X_CMD_VERSION - */ - -#define AUOK190X_VERSION_TEMP_MASK (0x1ff) -#define AUOK190X_VERSION_EPD_MASK (0xff) -#define AUOK190X_VERSION_SIZE_INT(_val) ((_val & 0xfc00) >> 10) -#define AUOK190X_VERSION_SIZE_FLOAT(_val) ((_val & 0x3c0) >> 6) -#define AUOK190X_VERSION_MODEL(_val) (_val & 0x3f) -#define AUOK190X_VERSION_LUT(_val) (_val & 0xff) -#define AUOK190X_VERSION_TCON(_val) ((_val & 0xff00) >> 8) - -/* - * update modes for CMD_PARTIALDISP on K1900 and CMD_DDMA on K1901 - */ - -#define AUOK190X_UPDATE_MODE(_res) ((_res & 0x7) << 12) -#define AUOK190X_UPDATE_NONFLASH (1 << 15) - -/* - * track panel specific parameters for common init - */ - -struct auok190x_init_data { - char *id; - struct auok190x_board *board; - - void (*update_partial)(struct auok190xfb_par *par, u16 y1, u16 y2); - void (*update_all)(struct auok190xfb_par *par); - bool (*need_refresh)(struct auok190xfb_par *par); - void (*init)(struct auok190xfb_par *par); -}; - - -extern void auok190x_send_command_nowait(struct auok190xfb_par *par, u16 data); -extern int auok190x_send_command(struct auok190xfb_par *par, u16 data); -extern void auok190x_send_cmdargs_nowait(struct auok190xfb_par *par, u16 cmd, - int argc, u16 *argv); -extern int auok190x_send_cmdargs(struct auok190xfb_par *par, u16 cmd, - int argc, u16 *argv); -extern void auok190x_send_cmdargs_pixels_nowait(struct auok190xfb_par *par, - u16 cmd, int argc, u16 *argv, - int size, u16 *data); -extern int auok190x_send_cmdargs_pixels(struct auok190xfb_par *par, u16 cmd, - int argc, u16 *argv, int size, - u16 *data); -extern int auok190x_read_cmdargs(struct auok190xfb_par *par, u16 cmd, - int argc, u16 *argv); - -extern int auok190x_common_probe(struct platform_device *pdev, - struct auok190x_init_data *init); -extern int auok190x_common_remove(struct platform_device *pdev); - -extern const struct dev_pm_ops auok190x_pm; diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c index 487d5e336e1b..82c20c6047b0 100644 --- a/drivers/video/fbdev/core/fb_defio.c +++ b/drivers/video/fbdev/core/fb_defio.c @@ -37,7 +37,7 @@ static struct page *fb_deferred_io_page(struct fb_info *info, unsigned long offs } /* this is to find and return the vmalloc-ed fb pages */ -static int fb_deferred_io_fault(struct vm_fault *vmf) +static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf) { unsigned long offset; struct page *page; @@ -90,7 +90,7 @@ int fb_deferred_io_fsync(struct file *file, loff_t start, loff_t end, int datasy EXPORT_SYMBOL_GPL(fb_deferred_io_fsync); /* vm_ops->page_mkwrite handler */ -static int fb_deferred_io_mkwrite(struct vm_fault *vmf) +static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf) { struct page *page = vmf->page; struct fb_info *info = vmf->vma->vm_private_data; diff --git a/drivers/video/fbdev/mmp/fb/mmpfb.c b/drivers/video/fbdev/mmp/fb/mmpfb.c index f27697e07c55..ee212be67dc6 100644 --- a/drivers/video/fbdev/mmp/fb/mmpfb.c +++ b/drivers/video/fbdev/mmp/fb/mmpfb.c @@ -495,10 +495,9 @@ static int modes_setup(struct mmpfb_info *fbi) /* put videomode list to info structure */ videomodes = kcalloc(videomode_num, sizeof(struct fb_videomode), GFP_KERNEL); - if (!videomodes) { - dev_err(fbi->dev, "can't malloc video modes\n"); + if (!videomodes) return -ENOMEM; - } + for (i = 0; i < videomode_num; i++) mmpmode_to_fbmode(&videomodes[i], &mmp_modes[i]); fb_videomode_to_modelist(videomodes, videomode_num, &info->modelist); diff --git a/drivers/video/fbdev/mmp/hw/mmp_ctrl.c b/drivers/video/fbdev/mmp/hw/mmp_ctrl.c index b6f83d5df9fd..fcdbb2df137f 100644 --- a/drivers/video/fbdev/mmp/hw/mmp_ctrl.c +++ b/drivers/video/fbdev/mmp/hw/mmp_ctrl.c @@ -406,12 +406,10 @@ static int path_init(struct mmphw_path_plat *path_plat, dev_info(ctrl->dev, "%s: %s\n", __func__, config->name); /* init driver data */ - path_info = kzalloc(sizeof(struct mmp_path_info), GFP_KERNEL); - if (!path_info) { - dev_err(ctrl->dev, "%s: unable to alloc path_info for %s\n", - __func__, config->name); + path_info = kzalloc(sizeof(*path_info), GFP_KERNEL); + if (!path_info) return 0; - } + path_info->name = config->name; path_info->id = path_plat->id; path_info->dev = ctrl->dev; diff --git a/drivers/video/fbdev/nvidia/nvidia.c b/drivers/video/fbdev/nvidia/nvidia.c index 2e50120bcfae..fbeeed5afe35 100644 --- a/drivers/video/fbdev/nvidia/nvidia.c +++ b/drivers/video/fbdev/nvidia/nvidia.c @@ -1548,7 +1548,7 @@ MODULE_PARM_DESC(noaccel, "(default=0)"); module_param(noscale, int, 0); MODULE_PARM_DESC(noscale, - "Disables screen scaleing. (0 or 1=disable) " + "Disables screen scaling. (0 or 1=disable) " "(default=0, do scaling)"); module_param(paneltweak, int, 0); MODULE_PARM_DESC(paneltweak, diff --git a/drivers/video/fbdev/omap/lcd_ams_delta.c b/drivers/video/fbdev/omap/lcd_ams_delta.c index a4ee947006c7..e8c748a0dfe2 100644 --- a/drivers/video/fbdev/omap/lcd_ams_delta.c +++ b/drivers/video/fbdev/omap/lcd_ams_delta.c @@ -197,3 +197,7 @@ static struct platform_driver ams_delta_panel_driver = { }; module_platform_driver(ams_delta_panel_driver); + +MODULE_AUTHOR("Jonathan McDowell <noodles@earth.li>"); +MODULE_DESCRIPTION("LCD panel support for the Amstrad E3 (Delta) videophone"); +MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/omap/lcd_h3.c b/drivers/video/fbdev/omap/lcd_h3.c index 796f4634c4c6..fd0ac997fb8c 100644 --- a/drivers/video/fbdev/omap/lcd_h3.c +++ b/drivers/video/fbdev/omap/lcd_h3.c @@ -89,3 +89,7 @@ static struct platform_driver h3_panel_driver = { }; module_platform_driver(h3_panel_driver); + +MODULE_AUTHOR("Imre Deak"); +MODULE_DESCRIPTION("LCD panel support for the TI OMAP H3 board"); +MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/omap/lcd_htcherald.c b/drivers/video/fbdev/omap/lcd_htcherald.c index 9d692f5b8025..db4ff1c6add9 100644 --- a/drivers/video/fbdev/omap/lcd_htcherald.c +++ b/drivers/video/fbdev/omap/lcd_htcherald.c @@ -66,3 +66,7 @@ static struct platform_driver htcherald_panel_driver = { }; module_platform_driver(htcherald_panel_driver); + +MODULE_AUTHOR("Cory Maccarrone"); +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("LCD panel support for the HTC Herald"); diff --git a/drivers/video/fbdev/omap/lcd_inn1510.c b/drivers/video/fbdev/omap/lcd_inn1510.c index b284050f5471..1ea775f17bc1 100644 --- a/drivers/video/fbdev/omap/lcd_inn1510.c +++ b/drivers/video/fbdev/omap/lcd_inn1510.c @@ -73,3 +73,7 @@ static struct platform_driver innovator1510_panel_driver = { }; module_platform_driver(innovator1510_panel_driver); + +MODULE_AUTHOR("Imre Deak"); +MODULE_DESCRIPTION("LCD panel support for the TI OMAP1510 Innovator board"); +MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/omap/lcd_inn1610.c b/drivers/video/fbdev/omap/lcd_inn1610.c index 1841710e796f..8d0cf68d2de3 100644 --- a/drivers/video/fbdev/omap/lcd_inn1610.c +++ b/drivers/video/fbdev/omap/lcd_inn1610.c @@ -106,3 +106,7 @@ static struct platform_driver innovator1610_panel_driver = { }; module_platform_driver(innovator1610_panel_driver); + +MODULE_AUTHOR("Imre Deak"); +MODULE_DESCRIPTION("LCD panel support for the TI OMAP1610 Innovator board"); +MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/omap/lcd_osk.c b/drivers/video/fbdev/omap/lcd_osk.c index b0be5771fe90..9fc43a14957d 100644 --- a/drivers/video/fbdev/omap/lcd_osk.c +++ b/drivers/video/fbdev/omap/lcd_osk.c @@ -93,3 +93,7 @@ static struct platform_driver osk_panel_driver = { }; module_platform_driver(osk_panel_driver); + +MODULE_AUTHOR("Imre Deak"); +MODULE_DESCRIPTION("LCD panel support for the TI OMAP OSK board"); +MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/omap/lcd_palmte.c b/drivers/video/fbdev/omap/lcd_palmte.c index cef96386cf80..a0e888643131 100644 --- a/drivers/video/fbdev/omap/lcd_palmte.c +++ b/drivers/video/fbdev/omap/lcd_palmte.c @@ -59,3 +59,7 @@ static struct platform_driver palmte_panel_driver = { }; module_platform_driver(palmte_panel_driver); + +MODULE_AUTHOR("Romain Goyet <r.goyet@gmail.com>, Laurent Gonzalez <palmte.linux@free.fr>"); +MODULE_DESCRIPTION("LCD panel support for the Palm Tungsten E"); +MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/omap/lcd_palmtt.c b/drivers/video/fbdev/omap/lcd_palmtt.c index 627f13dae5ad..2c45375e456f 100644 --- a/drivers/video/fbdev/omap/lcd_palmtt.c +++ b/drivers/video/fbdev/omap/lcd_palmtt.c @@ -72,3 +72,7 @@ static struct platform_driver palmtt_panel_driver = { }; module_platform_driver(palmtt_panel_driver); + +MODULE_AUTHOR("Marek Vasut <marek.vasut@gmail.com>"); +MODULE_DESCRIPTION("LCD panel support for Palm Tungsten|T"); +MODULE_LICENSE("GPL"); diff --git a/drivers/video/fbdev/omap/lcd_palmz71.c b/drivers/video/fbdev/omap/lcd_palmz71.c index c46d4db1f839..c99a15ab1826 100644 --- a/drivers/video/fbdev/omap/lcd_palmz71.c +++ b/drivers/video/fbdev/omap/lcd_palmz71.c @@ -66,3 +66,7 @@ static struct platform_driver palmz71_panel_driver = { }; module_platform_driver(palmz71_panel_driver); + +MODULE_AUTHOR("Romain Goyet, Laurent Gonzalez, Marek Vasut"); +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("LCD panel support for the Palm Zire71"); diff --git a/drivers/video/fbdev/omap/omapfb_main.c b/drivers/video/fbdev/omap/omapfb_main.c index 3479a47a3082..585f39efcff6 100644 --- a/drivers/video/fbdev/omap/omapfb_main.c +++ b/drivers/video/fbdev/omap/omapfb_main.c @@ -1645,7 +1645,7 @@ static int omapfb_do_probe(struct platform_device *pdev, goto cleanup; } - fbdev = kzalloc(sizeof(struct omapfb_device), GFP_KERNEL); + fbdev = kzalloc(sizeof(*fbdev), GFP_KERNEL); if (fbdev == NULL) { dev_err(&pdev->dev, "unable to allocate memory for device info\n"); diff --git a/drivers/video/fbdev/omap2/omapfb/Kconfig b/drivers/video/fbdev/omap2/omapfb/Kconfig index e6226aeed17e..3bf154e676d1 100644 --- a/drivers/video/fbdev/omap2/omapfb/Kconfig +++ b/drivers/video/fbdev/omap2/omapfb/Kconfig @@ -5,6 +5,7 @@ menuconfig FB_OMAP2 tristate "OMAP2+ frame buffer support" depends on FB depends on DRM_OMAP = n + depends on GPIOLIB select FB_OMAP2_DSS select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3 diff --git a/drivers/video/fbdev/omap2/omapfb/displays/panel-dsi-cm.c b/drivers/video/fbdev/omap2/omapfb/displays/panel-dsi-cm.c index bef431530090..87497a00241f 100644 --- a/drivers/video/fbdev/omap2/omapfb/displays/panel-dsi-cm.c +++ b/drivers/video/fbdev/omap2/omapfb/displays/panel-dsi-cm.c @@ -387,8 +387,7 @@ static void dsicm_get_resolution(struct omap_dss_device *dssdev, static ssize_t dsicm_num_errors_show(struct device *dev, struct device_attribute *attr, char *buf) { - struct platform_device *pdev = to_platform_device(dev); - struct panel_drv_data *ddata = platform_get_drvdata(pdev); + struct panel_drv_data *ddata = dev_get_drvdata(dev); struct omap_dss_device *in = ddata->in; u8 errors = 0; int r; @@ -419,8 +418,7 @@ static ssize_t dsicm_num_errors_show(struct device *dev, static ssize_t dsicm_hw_revision_show(struct device *dev, struct device_attribute *attr, char *buf) { - struct platform_device *pdev = to_platform_device(dev); - struct panel_drv_data *ddata = platform_get_drvdata(pdev); + struct panel_drv_data *ddata = dev_get_drvdata(dev); struct omap_dss_device *in = ddata->in; u8 id1, id2, id3; int r; @@ -451,8 +449,7 @@ static ssize_t dsicm_store_ulps(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { - struct platform_device *pdev = to_platform_device(dev); - struct panel_drv_data *ddata = platform_get_drvdata(pdev); + struct panel_drv_data *ddata = dev_get_drvdata(dev); struct omap_dss_device *in = ddata->in; unsigned long t; int r; @@ -486,8 +483,7 @@ static ssize_t dsicm_show_ulps(struct device *dev, struct device_attribute *attr, char *buf) { - struct platform_device *pdev = to_platform_device(dev); - struct panel_drv_data *ddata = platform_get_drvdata(pdev); + struct panel_drv_data *ddata = dev_get_drvdata(dev); unsigned t; mutex_lock(&ddata->lock); @@ -501,8 +497,7 @@ static ssize_t dsicm_store_ulps_timeout(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { - struct platform_device *pdev = to_platform_device(dev); - struct panel_drv_data *ddata = platform_get_drvdata(pdev); + struct panel_drv_data *ddata = dev_get_drvdata(dev); struct omap_dss_device *in = ddata->in; unsigned long t; int r; @@ -533,8 +528,7 @@ static ssize_t dsicm_show_ulps_timeout(struct device *dev, struct device_attribute *attr, char *buf) { - struct platform_device *pdev = to_platform_device(dev); - struct panel_drv_data *ddata = platform_get_drvdata(pdev); + struct panel_drv_data *ddata = dev_get_drvdata(dev); unsigned t; mutex_lock(&ddata->lock); diff --git a/drivers/video/fbdev/pxafb.c b/drivers/video/fbdev/pxafb.c index c3d49e13643c..76722a59f55e 100644 --- a/drivers/video/fbdev/pxafb.c +++ b/drivers/video/fbdev/pxafb.c @@ -2115,12 +2115,10 @@ static int of_get_pxafb_display(struct device *dev, struct device_node *disp, if (ret) s = "color-tft"; - for (i = 0; lcd_types[i]; i++) - if (!strcmp(s, lcd_types[i])) - break; - if (!i || !lcd_types[i]) { + i = match_string(lcd_types, -1, s); + if (i < 0) { dev_err(dev, "lcd-type %s is unknown\n", s); - return -EINVAL; + return i; } info->lcd_conn |= LCD_CONN_TYPE(i); info->lcd_conn |= LCD_CONN_WIDTH(bus_width); diff --git a/drivers/video/fbdev/savage/savagefb_driver.c b/drivers/video/fbdev/savage/savagefb_driver.c index c20468362f11..c09d7426cd92 100644 --- a/drivers/video/fbdev/savage/savagefb_driver.c +++ b/drivers/video/fbdev/savage/savagefb_driver.c @@ -1892,11 +1892,11 @@ static int savage_init_hw(struct savagefb_par *par) vga_out8(0x3d4, 0x66, par); cr66 = vga_in8(0x3d5, par); vga_out8(0x3d5, cr66 | 0x02, par); - mdelay(10); + usleep_range(10000, 11000); vga_out8(0x3d4, 0x66, par); vga_out8(0x3d5, cr66 & ~0x02, par); /* clear reset flag */ - mdelay(10); + usleep_range(10000, 11000); /* @@ -1906,11 +1906,11 @@ static int savage_init_hw(struct savagefb_par *par) vga_out8(0x3d4, 0x3f, par); cr3f = vga_in8(0x3d5, par); vga_out8(0x3d5, cr3f | 0x08, par); - mdelay(10); + usleep_range(10000, 11000); vga_out8(0x3d4, 0x3f, par); vga_out8(0x3d5, cr3f & ~0x08, par); /* clear reset flags */ - mdelay(10); + usleep_range(10000, 11000); /* Savage ramdac speeds */ par->numClocks = 4; diff --git a/drivers/video/fbdev/sh_mobile_lcdcfb.c b/drivers/video/fbdev/sh_mobile_lcdcfb.c index c3a46506e47e..dc46be38c970 100644 --- a/drivers/video/fbdev/sh_mobile_lcdcfb.c +++ b/drivers/video/fbdev/sh_mobile_lcdcfb.c @@ -29,7 +29,6 @@ #include <linux/vmalloc.h> #include <video/sh_mobile_lcdc.h> -#include <video/sh_mobile_meram.h> #include "sh_mobile_lcdcfb.h" @@ -217,7 +216,6 @@ struct sh_mobile_lcdc_priv { struct notifier_block notifier; int started; int forced_fourcc; /* 2 channel LCDC must share fourcc setting */ - struct sh_mobile_meram_info *meram_dev; }; /* ----------------------------------------------------------------------------- @@ -346,16 +344,12 @@ static void sh_mobile_lcdc_clk_on(struct sh_mobile_lcdc_priv *priv) if (priv->dot_clk) clk_prepare_enable(priv->dot_clk); pm_runtime_get_sync(priv->dev); - if (priv->meram_dev && priv->meram_dev->pdev) - pm_runtime_get_sync(&priv->meram_dev->pdev->dev); } } static void sh_mobile_lcdc_clk_off(struct sh_mobile_lcdc_priv *priv) { if (atomic_sub_return(1, &priv->hw_usecnt) == -1) { - if (priv->meram_dev && priv->meram_dev->pdev) - pm_runtime_put_sync(&priv->meram_dev->pdev->dev); pm_runtime_put(priv->dev); if (priv->dot_clk) clk_disable_unprepare(priv->dot_clk); @@ -1073,7 +1067,6 @@ static void __sh_mobile_lcdc_start(struct sh_mobile_lcdc_priv *priv) static int sh_mobile_lcdc_start(struct sh_mobile_lcdc_priv *priv) { - struct sh_mobile_meram_info *mdev = priv->meram_dev; struct sh_mobile_lcdc_chan *ch; unsigned long tmp; int ret; @@ -1106,9 +1099,6 @@ static int sh_mobile_lcdc_start(struct sh_mobile_lcdc_priv *priv) /* Compute frame buffer base address and pitch for each channel. */ for (k = 0; k < ARRAY_SIZE(priv->ch); k++) { - int pixelformat; - void *cache; - ch = &priv->ch[k]; if (!ch->enabled) continue; @@ -1117,45 +1107,6 @@ static int sh_mobile_lcdc_start(struct sh_mobile_lcdc_priv *priv) ch->base_addr_c = ch->dma_handle + ch->xres_virtual * ch->yres_virtual; ch->line_size = ch->pitch; - - /* Enable MERAM if possible. */ - if (mdev == NULL || ch->cfg->meram_cfg == NULL) - continue; - - /* Free the allocated MERAM cache. */ - if (ch->cache) { - sh_mobile_meram_cache_free(mdev, ch->cache); - ch->cache = NULL; - } - - switch (ch->format->fourcc) { - case V4L2_PIX_FMT_NV12: - case V4L2_PIX_FMT_NV21: - case V4L2_PIX_FMT_NV16: - case V4L2_PIX_FMT_NV61: - pixelformat = SH_MOBILE_MERAM_PF_NV; - break; - case V4L2_PIX_FMT_NV24: - case V4L2_PIX_FMT_NV42: - pixelformat = SH_MOBILE_MERAM_PF_NV24; - break; - case V4L2_PIX_FMT_RGB565: - case V4L2_PIX_FMT_BGR24: - case V4L2_PIX_FMT_BGR32: - default: - pixelformat = SH_MOBILE_MERAM_PF_RGB; - break; - } - - cache = sh_mobile_meram_cache_alloc(mdev, ch->cfg->meram_cfg, - ch->pitch, ch->yres, pixelformat, - &ch->line_size); - if (!IS_ERR(cache)) { - sh_mobile_meram_cache_update(mdev, cache, - ch->base_addr_y, ch->base_addr_c, - &ch->base_addr_y, &ch->base_addr_c); - ch->cache = cache; - } } for (k = 0; k < ARRAY_SIZE(priv->overlays); ++k) { @@ -1223,13 +1174,6 @@ static void sh_mobile_lcdc_stop(struct sh_mobile_lcdc_priv *priv) } sh_mobile_lcdc_display_off(ch); - - /* Free the MERAM cache. */ - if (ch->cache) { - sh_mobile_meram_cache_free(priv->meram_dev, ch->cache); - ch->cache = NULL; - } - } /* stop the lcdc */ @@ -1851,11 +1795,6 @@ static int sh_mobile_lcdc_pan(struct fb_var_screeninfo *var, base_addr_c = ch->dma_handle + ch->xres_virtual * ch->yres_virtual + c_offset; - if (ch->cache) - sh_mobile_meram_cache_update(priv->meram_dev, ch->cache, - base_addr_y, base_addr_c, - &base_addr_y, &base_addr_c); - ch->base_addr_y = base_addr_y; ch->base_addr_c = base_addr_c; ch->pan_y_offset = y_offset; @@ -2149,10 +2088,8 @@ sh_mobile_lcdc_channel_fb_register(struct sh_mobile_lcdc_chan *ch) if (info->fbdefio) { ch->sglist = vmalloc(sizeof(struct scatterlist) * ch->fb_size >> PAGE_SHIFT); - if (!ch->sglist) { - dev_err(ch->lcdc->dev, "cannot allocate sglist\n"); + if (!ch->sglist) return -ENOMEM; - } } info->bl_dev = ch->bl; @@ -2354,8 +2291,7 @@ static int sh_mobile_lcdc_resume(struct device *dev) static int sh_mobile_lcdc_runtime_suspend(struct device *dev) { - struct platform_device *pdev = to_platform_device(dev); - struct sh_mobile_lcdc_priv *priv = platform_get_drvdata(pdev); + struct sh_mobile_lcdc_priv *priv = dev_get_drvdata(dev); /* turn off LCDC hardware */ lcdc_write(priv, _LDCNT1R, 0); @@ -2365,8 +2301,7 @@ static int sh_mobile_lcdc_runtime_suspend(struct device *dev) static int sh_mobile_lcdc_runtime_resume(struct device *dev) { - struct platform_device *pdev = to_platform_device(dev); - struct sh_mobile_lcdc_priv *priv = platform_get_drvdata(pdev); + struct sh_mobile_lcdc_priv *priv = dev_get_drvdata(dev); __sh_mobile_lcdc_start(priv); @@ -2718,13 +2653,11 @@ static int sh_mobile_lcdc_probe(struct platform_device *pdev) } priv = kzalloc(sizeof(*priv), GFP_KERNEL); - if (!priv) { - dev_err(&pdev->dev, "cannot allocate device data\n"); + if (!priv) return -ENOMEM; - } priv->dev = &pdev->dev; - priv->meram_dev = pdata->meram_dev; + for (i = 0; i < ARRAY_SIZE(priv->ch); i++) mutex_init(&priv->ch[i].open_lock); platform_set_drvdata(pdev, priv); diff --git a/drivers/video/fbdev/sh_mobile_lcdcfb.h b/drivers/video/fbdev/sh_mobile_lcdcfb.h index cc52c74721fe..b8e47a8bd8ab 100644 --- a/drivers/video/fbdev/sh_mobile_lcdcfb.h +++ b/drivers/video/fbdev/sh_mobile_lcdcfb.h @@ -61,7 +61,6 @@ struct sh_mobile_lcdc_chan { unsigned long *reg_offs; unsigned long ldmt1r_value; unsigned long enabled; /* ME and SE in LDCNT2R */ - void *cache; struct mutex open_lock; /* protects the use counter */ int use_count; diff --git a/drivers/video/fbdev/sh_mobile_meram.c b/drivers/video/fbdev/sh_mobile_meram.c deleted file mode 100644 index baadfb207b2e..000000000000 --- a/drivers/video/fbdev/sh_mobile_meram.c +++ /dev/null @@ -1,758 +0,0 @@ -/* - * SuperH Mobile MERAM Driver for SuperH Mobile LCDC Driver - * - * Copyright (c) 2011 Damian Hobson-Garcia <dhobsong@igel.co.jp> - * Takanari Hayama <taki@igel.co.jp> - * - * This file is subject to the terms and conditions of the GNU General Public - * License. See the file "COPYING" in the main directory of this archive - * for more details. - */ - -#include <linux/device.h> -#include <linux/err.h> -#include <linux/export.h> -#include <linux/genalloc.h> -#include <linux/io.h> -#include <linux/kernel.h> -#include <linux/module.h> -#include <linux/platform_device.h> -#include <linux/pm_runtime.h> -#include <linux/slab.h> - -#include <video/sh_mobile_meram.h> - -/* ----------------------------------------------------------------------------- - * MERAM registers - */ - -#define MEVCR1 0x4 -#define MEVCR1_RST (1 << 31) -#define MEVCR1_WD (1 << 30) -#define MEVCR1_AMD1 (1 << 29) -#define MEVCR1_AMD0 (1 << 28) -#define MEQSEL1 0x40 -#define MEQSEL2 0x44 - -#define MExxCTL 0x400 -#define MExxCTL_BV (1 << 31) -#define MExxCTL_BSZ_SHIFT 28 -#define MExxCTL_MSAR_MASK (0x7ff << MExxCTL_MSAR_SHIFT) -#define MExxCTL_MSAR_SHIFT 16 -#define MExxCTL_NXT_MASK (0x1f << MExxCTL_NXT_SHIFT) -#define MExxCTL_NXT_SHIFT 11 -#define MExxCTL_WD1 (1 << 10) -#define MExxCTL_WD0 (1 << 9) -#define MExxCTL_WS (1 << 8) -#define MExxCTL_CB (1 << 7) -#define MExxCTL_WBF (1 << 6) -#define MExxCTL_WF (1 << 5) -#define MExxCTL_RF (1 << 4) -#define MExxCTL_CM (1 << 3) -#define MExxCTL_MD_READ (1 << 0) -#define MExxCTL_MD_WRITE (2 << 0) -#define MExxCTL_MD_ICB_WB (3 << 0) -#define MExxCTL_MD_ICB (4 << 0) -#define MExxCTL_MD_FB (7 << 0) -#define MExxCTL_MD_MASK (7 << 0) -#define MExxBSIZE 0x404 -#define MExxBSIZE_RCNT_SHIFT 28 -#define MExxBSIZE_YSZM1_SHIFT 16 -#define MExxBSIZE_XSZM1_SHIFT 0 -#define MExxMNCF 0x408 -#define MExxMNCF_KWBNM_SHIFT 28 -#define MExxMNCF_KRBNM_SHIFT 24 -#define MExxMNCF_BNM_SHIFT 16 -#define MExxMNCF_XBV (1 << 15) -#define MExxMNCF_CPL_YCBCR444 (1 << 12) -#define MExxMNCF_CPL_YCBCR420 (2 << 12) -#define MExxMNCF_CPL_YCBCR422 (3 << 12) -#define MExxMNCF_CPL_MSK (3 << 12) -#define MExxMNCF_BL (1 << 2) -#define MExxMNCF_LNM_SHIFT 0 -#define MExxSARA 0x410 -#define MExxSARB 0x414 -#define MExxSBSIZE 0x418 -#define MExxSBSIZE_HDV (1 << 31) -#define MExxSBSIZE_HSZ16 (0 << 28) -#define MExxSBSIZE_HSZ32 (1 << 28) -#define MExxSBSIZE_HSZ64 (2 << 28) -#define MExxSBSIZE_HSZ128 (3 << 28) -#define MExxSBSIZE_SBSIZZ_SHIFT 0 - -#define MERAM_MExxCTL_VAL(next, addr) \ - ((((next) << MExxCTL_NXT_SHIFT) & MExxCTL_NXT_MASK) | \ - (((addr) << MExxCTL_MSAR_SHIFT) & MExxCTL_MSAR_MASK)) -#define MERAM_MExxBSIZE_VAL(rcnt, yszm1, xszm1) \ - (((rcnt) << MExxBSIZE_RCNT_SHIFT) | \ - ((yszm1) << MExxBSIZE_YSZM1_SHIFT) | \ - ((xszm1) << MExxBSIZE_XSZM1_SHIFT)) - -static const unsigned long common_regs[] = { - MEVCR1, - MEQSEL1, - MEQSEL2, -}; -#define MERAM_REGS_SIZE ARRAY_SIZE(common_regs) - -static const unsigned long icb_regs[] = { - MExxCTL, - MExxBSIZE, - MExxMNCF, - MExxSARA, - MExxSARB, - MExxSBSIZE, -}; -#define ICB_REGS_SIZE ARRAY_SIZE(icb_regs) - -/* - * sh_mobile_meram_icb - MERAM ICB information - * @regs: Registers cache - * @index: ICB index - * @offset: MERAM block offset - * @size: MERAM block size in KiB - * @cache_unit: Bytes to cache per ICB - * @pixelformat: Video pixel format of the data stored in the ICB - * @current_reg: Which of Start Address Register A (0) or B (1) is in use - */ -struct sh_mobile_meram_icb { - unsigned long regs[ICB_REGS_SIZE]; - unsigned int index; - unsigned long offset; - unsigned int size; - - unsigned int cache_unit; - unsigned int pixelformat; - unsigned int current_reg; -}; - -#define MERAM_ICB_NUM 32 - -struct sh_mobile_meram_fb_plane { - struct sh_mobile_meram_icb *marker; - struct sh_mobile_meram_icb *cache; -}; - -struct sh_mobile_meram_fb_cache { - unsigned int nplanes; - struct sh_mobile_meram_fb_plane planes[2]; -}; - -/* - * sh_mobile_meram_priv - MERAM device - * @base: Registers base address - * @meram: MERAM physical address - * @regs: Registers cache - * @lock: Protects used_icb and icbs - * @used_icb: Bitmask of used ICBs - * @icbs: ICBs - * @pool: Allocation pool to manage the MERAM - */ -struct sh_mobile_meram_priv { - void __iomem *base; - unsigned long meram; - unsigned long regs[MERAM_REGS_SIZE]; - - struct mutex lock; - unsigned long used_icb; - struct sh_mobile_meram_icb icbs[MERAM_ICB_NUM]; - - struct gen_pool *pool; -}; - -/* settings */ -#define MERAM_GRANULARITY 1024 -#define MERAM_SEC_LINE 15 -#define MERAM_LINE_WIDTH 2048 - -/* ----------------------------------------------------------------------------- - * Registers access - */ - -#define MERAM_ICB_OFFSET(base, idx, off) ((base) + (off) + (idx) * 0x20) - -static inline void meram_write_icb(void __iomem *base, unsigned int idx, - unsigned int off, unsigned long val) -{ - iowrite32(val, MERAM_ICB_OFFSET(base, idx, off)); -} - -static inline unsigned long meram_read_icb(void __iomem *base, unsigned int idx, - unsigned int off) -{ - return ioread32(MERAM_ICB_OFFSET(base, idx, off)); -} - -static inline void meram_write_reg(void __iomem *base, unsigned int off, - unsigned long val) -{ - iowrite32(val, base + off); -} - -static inline unsigned long meram_read_reg(void __iomem *base, unsigned int off) -{ - return ioread32(base + off); -} - -/* ----------------------------------------------------------------------------- - * MERAM allocation and free - */ - -static unsigned long meram_alloc(struct sh_mobile_meram_priv *priv, size_t size) -{ - return gen_pool_alloc(priv->pool, size); -} - -static void meram_free(struct sh_mobile_meram_priv *priv, unsigned long mem, - size_t size) -{ - gen_pool_free(priv->pool, mem, size); -} - -/* ----------------------------------------------------------------------------- - * LCDC cache planes allocation, init, cleanup and free - */ - -/* Allocate ICBs and MERAM for a plane. */ -static int meram_plane_alloc(struct sh_mobile_meram_priv *priv, - struct sh_mobile_meram_fb_plane *plane, - size_t size) -{ - unsigned long mem; - unsigned long idx; - - idx = find_first_zero_bit(&priv->used_icb, 28); - if (idx == 28) - return -ENOMEM; - plane->cache = &priv->icbs[idx]; - - idx = find_next_zero_bit(&priv->used_icb, 32, 28); - if (idx == 32) - return -ENOMEM; - plane->marker = &priv->icbs[idx]; - - mem = meram_alloc(priv, size * 1024); - if (mem == 0) - return -ENOMEM; - - __set_bit(plane->marker->index, &priv->used_icb); - __set_bit(plane->cache->index, &priv->used_icb); - - plane->marker->offset = mem - priv->meram; - plane->marker->size = size; - - return 0; -} - -/* Free ICBs and MERAM for a plane. */ -static void meram_plane_free(struct sh_mobile_meram_priv *priv, - struct sh_mobile_meram_fb_plane *plane) -{ - meram_free(priv, priv->meram + plane->marker->offset, - plane->marker->size * 1024); - - __clear_bit(plane->marker->index, &priv->used_icb); - __clear_bit(plane->cache->index, &priv->used_icb); -} - -/* Is this a YCbCr(NV12, NV16 or NV24) colorspace? */ -static int is_nvcolor(int cspace) -{ - if (cspace == SH_MOBILE_MERAM_PF_NV || - cspace == SH_MOBILE_MERAM_PF_NV24) - return 1; - return 0; -} - -/* Set the next address to fetch. */ -static void meram_set_next_addr(struct sh_mobile_meram_priv *priv, - struct sh_mobile_meram_fb_cache *cache, - unsigned long base_addr_y, - unsigned long base_addr_c) -{ - struct sh_mobile_meram_icb *icb = cache->planes[0].marker; - unsigned long target; - - icb->current_reg ^= 1; - target = icb->current_reg ? MExxSARB : MExxSARA; - - /* set the next address to fetch */ - meram_write_icb(priv->base, cache->planes[0].cache->index, target, - base_addr_y); - meram_write_icb(priv->base, cache->planes[0].marker->index, target, - base_addr_y + cache->planes[0].marker->cache_unit); - - if (cache->nplanes == 2) { - meram_write_icb(priv->base, cache->planes[1].cache->index, - target, base_addr_c); - meram_write_icb(priv->base, cache->planes[1].marker->index, - target, base_addr_c + - cache->planes[1].marker->cache_unit); - } -} - -/* Get the next ICB address. */ -static void -meram_get_next_icb_addr(struct sh_mobile_meram_info *pdata, - struct sh_mobile_meram_fb_cache *cache, - unsigned long *icb_addr_y, unsigned long *icb_addr_c) -{ - struct sh_mobile_meram_icb *icb = cache->planes[0].marker; - unsigned long icb_offset; - - if (pdata->addr_mode == SH_MOBILE_MERAM_MODE0) - icb_offset = 0x80000000 | (icb->current_reg << 29); - else - icb_offset = 0xc0000000 | (icb->current_reg << 23); - - *icb_addr_y = icb_offset | (cache->planes[0].marker->index << 24); - if (cache->nplanes == 2) - *icb_addr_c = icb_offset - | (cache->planes[1].marker->index << 24); -} - -#define MERAM_CALC_BYTECOUNT(x, y) \ - (((x) * (y) + (MERAM_LINE_WIDTH - 1)) & ~(MERAM_LINE_WIDTH - 1)) - -/* Initialize MERAM. */ -static int meram_plane_init(struct sh_mobile_meram_priv *priv, - struct sh_mobile_meram_fb_plane *plane, - unsigned int xres, unsigned int yres, - unsigned int *out_pitch) -{ - struct sh_mobile_meram_icb *marker = plane->marker; - unsigned long total_byte_count = MERAM_CALC_BYTECOUNT(xres, yres); - unsigned long bnm; - unsigned int lcdc_pitch; - unsigned int xpitch; - unsigned int line_cnt; - unsigned int save_lines; - - /* adjust pitch to 1024, 2048, 4096 or 8192 */ - lcdc_pitch = (xres - 1) | 1023; - lcdc_pitch = lcdc_pitch | (lcdc_pitch >> 1); - lcdc_pitch = lcdc_pitch | (lcdc_pitch >> 2); - lcdc_pitch += 1; - - /* derive settings */ - if (lcdc_pitch == 8192 && yres >= 1024) { - lcdc_pitch = xpitch = MERAM_LINE_WIDTH; - line_cnt = total_byte_count >> 11; - *out_pitch = xres; - save_lines = plane->marker->size / 16 / MERAM_SEC_LINE; - save_lines *= MERAM_SEC_LINE; - } else { - xpitch = xres; - line_cnt = yres; - *out_pitch = lcdc_pitch; - save_lines = plane->marker->size / (lcdc_pitch >> 10) / 2; - save_lines &= 0xff; - } - bnm = (save_lines - 1) << 16; - - /* TODO: we better to check if we have enough MERAM buffer size */ - - /* set up ICB */ - meram_write_icb(priv->base, plane->cache->index, MExxBSIZE, - MERAM_MExxBSIZE_VAL(0x0, line_cnt - 1, xpitch - 1)); - meram_write_icb(priv->base, plane->marker->index, MExxBSIZE, - MERAM_MExxBSIZE_VAL(0xf, line_cnt - 1, xpitch - 1)); - - meram_write_icb(priv->base, plane->cache->index, MExxMNCF, bnm); - meram_write_icb(priv->base, plane->marker->index, MExxMNCF, bnm); - - meram_write_icb(priv->base, plane->cache->index, MExxSBSIZE, xpitch); - meram_write_icb(priv->base, plane->marker->index, MExxSBSIZE, xpitch); - - /* save a cache unit size */ - plane->cache->cache_unit = xres * save_lines; - plane->marker->cache_unit = xres * save_lines; - - /* - * Set MERAM for framebuffer - * - * we also chain the cache_icb and the marker_icb. - * we also split the allocated MERAM buffer between two ICBs. - */ - meram_write_icb(priv->base, plane->cache->index, MExxCTL, - MERAM_MExxCTL_VAL(plane->marker->index, marker->offset) - | MExxCTL_WD1 | MExxCTL_WD0 | MExxCTL_WS | MExxCTL_CM | - MExxCTL_MD_FB); - meram_write_icb(priv->base, plane->marker->index, MExxCTL, - MERAM_MExxCTL_VAL(plane->cache->index, marker->offset + - plane->marker->size / 2) | - MExxCTL_WD1 | MExxCTL_WD0 | MExxCTL_WS | MExxCTL_CM | - MExxCTL_MD_FB); - - return 0; -} - -static void meram_plane_cleanup(struct sh_mobile_meram_priv *priv, - struct sh_mobile_meram_fb_plane *plane) -{ - /* disable ICB */ - meram_write_icb(priv->base, plane->cache->index, MExxCTL, - MExxCTL_WBF | MExxCTL_WF | MExxCTL_RF); - meram_write_icb(priv->base, plane->marker->index, MExxCTL, - MExxCTL_WBF | MExxCTL_WF | MExxCTL_RF); - - plane->cache->cache_unit = 0; - plane->marker->cache_unit = 0; -} - -/* ----------------------------------------------------------------------------- - * MERAM operations - */ - -unsigned long sh_mobile_meram_alloc(struct sh_mobile_meram_info *pdata, - size_t size) -{ - struct sh_mobile_meram_priv *priv = pdata->priv; - - return meram_alloc(priv, size); -} -EXPORT_SYMBOL_GPL(sh_mobile_meram_alloc); - -void sh_mobile_meram_free(struct sh_mobile_meram_info *pdata, unsigned long mem, - size_t size) -{ - struct sh_mobile_meram_priv *priv = pdata->priv; - - meram_free(priv, mem, size); -} -EXPORT_SYMBOL_GPL(sh_mobile_meram_free); - -/* Allocate memory for the ICBs and mark them as used. */ -static struct sh_mobile_meram_fb_cache * -meram_cache_alloc(struct sh_mobile_meram_priv *priv, - const struct sh_mobile_meram_cfg *cfg, - int pixelformat) -{ - unsigned int nplanes = is_nvcolor(pixelformat) ? 2 : 1; - struct sh_mobile_meram_fb_cache *cache; - int ret; - - cache = kzalloc(sizeof(*cache), GFP_KERNEL); - if (cache == NULL) - return ERR_PTR(-ENOMEM); - - cache->nplanes = nplanes; - - ret = meram_plane_alloc(priv, &cache->planes[0], - cfg->icb[0].meram_size); - if (ret < 0) - goto error; - - cache->planes[0].marker->current_reg = 1; - cache->planes[0].marker->pixelformat = pixelformat; - - if (cache->nplanes == 1) - return cache; - - ret = meram_plane_alloc(priv, &cache->planes[1], - cfg->icb[1].meram_size); - if (ret < 0) { - meram_plane_free(priv, &cache->planes[0]); - goto error; - } - - return cache; - -error: - kfree(cache); - return ERR_PTR(-ENOMEM); -} - -void *sh_mobile_meram_cache_alloc(struct sh_mobile_meram_info *pdata, - const struct sh_mobile_meram_cfg *cfg, - unsigned int xres, unsigned int yres, - unsigned int pixelformat, unsigned int *pitch) -{ - struct sh_mobile_meram_fb_cache *cache; - struct sh_mobile_meram_priv *priv = pdata->priv; - struct platform_device *pdev = pdata->pdev; - unsigned int nplanes = is_nvcolor(pixelformat) ? 2 : 1; - unsigned int out_pitch; - - if (priv == NULL) - return ERR_PTR(-ENODEV); - - if (pixelformat != SH_MOBILE_MERAM_PF_NV && - pixelformat != SH_MOBILE_MERAM_PF_NV24 && - pixelformat != SH_MOBILE_MERAM_PF_RGB) - return ERR_PTR(-EINVAL); - - dev_dbg(&pdev->dev, "registering %dx%d (%s)", xres, yres, - !pixelformat ? "yuv" : "rgb"); - - /* we can't handle wider than 8192px */ - if (xres > 8192) { - dev_err(&pdev->dev, "width exceeding the limit (> 8192)."); - return ERR_PTR(-EINVAL); - } - - if (cfg->icb[0].meram_size == 0) - return ERR_PTR(-EINVAL); - - if (nplanes == 2 && cfg->icb[1].meram_size == 0) - return ERR_PTR(-EINVAL); - - mutex_lock(&priv->lock); - - /* We now register the ICBs and allocate the MERAM regions. */ - cache = meram_cache_alloc(priv, cfg, pixelformat); - if (IS_ERR(cache)) { - dev_err(&pdev->dev, "MERAM allocation failed (%ld).", - PTR_ERR(cache)); - goto err; - } - - /* initialize MERAM */ - meram_plane_init(priv, &cache->planes[0], xres, yres, &out_pitch); - *pitch = out_pitch; - if (pixelformat == SH_MOBILE_MERAM_PF_NV) - meram_plane_init(priv, &cache->planes[1], - xres, (yres + 1) / 2, &out_pitch); - else if (pixelformat == SH_MOBILE_MERAM_PF_NV24) - meram_plane_init(priv, &cache->planes[1], - 2 * xres, (yres + 1) / 2, &out_pitch); - -err: - mutex_unlock(&priv->lock); - return cache; -} -EXPORT_SYMBOL_GPL(sh_mobile_meram_cache_alloc); - -void -sh_mobile_meram_cache_free(struct sh_mobile_meram_info *pdata, void *data) -{ - struct sh_mobile_meram_fb_cache *cache = data; - struct sh_mobile_meram_priv *priv = pdata->priv; - - mutex_lock(&priv->lock); - - /* Cleanup and free. */ - meram_plane_cleanup(priv, &cache->planes[0]); - meram_plane_free(priv, &cache->planes[0]); - - if (cache->nplanes == 2) { - meram_plane_cleanup(priv, &cache->planes[1]); - meram_plane_free(priv, &cache->planes[1]); - } - - kfree(cache); - - mutex_unlock(&priv->lock); -} -EXPORT_SYMBOL_GPL(sh_mobile_meram_cache_free); - -void -sh_mobile_meram_cache_update(struct sh_mobile_meram_info *pdata, void *data, - unsigned long base_addr_y, - unsigned long base_addr_c, - unsigned long *icb_addr_y, - unsigned long *icb_addr_c) -{ - struct sh_mobile_meram_fb_cache *cache = data; - struct sh_mobile_meram_priv *priv = pdata->priv; - - mutex_lock(&priv->lock); - - meram_set_next_addr(priv, cache, base_addr_y, base_addr_c); - meram_get_next_icb_addr(pdata, cache, icb_addr_y, icb_addr_c); - - mutex_unlock(&priv->lock); -} -EXPORT_SYMBOL_GPL(sh_mobile_meram_cache_update); - -/* ----------------------------------------------------------------------------- - * Power management - */ - -#ifdef CONFIG_PM -static int sh_mobile_meram_suspend(struct device *dev) -{ - struct platform_device *pdev = to_platform_device(dev); - struct sh_mobile_meram_priv *priv = platform_get_drvdata(pdev); - unsigned int i, j; - - for (i = 0; i < MERAM_REGS_SIZE; i++) - priv->regs[i] = meram_read_reg(priv->base, common_regs[i]); - - for (i = 0; i < 32; i++) { - if (!test_bit(i, &priv->used_icb)) - continue; - for (j = 0; j < ICB_REGS_SIZE; j++) { - priv->icbs[i].regs[j] = - meram_read_icb(priv->base, i, icb_regs[j]); - /* Reset ICB on resume */ - if (icb_regs[j] == MExxCTL) - priv->icbs[i].regs[j] |= - MExxCTL_WBF | MExxCTL_WF | MExxCTL_RF; - } - } - return 0; -} - -static int sh_mobile_meram_resume(struct device *dev) -{ - struct platform_device *pdev = to_platform_device(dev); - struct sh_mobile_meram_priv *priv = platform_get_drvdata(pdev); - unsigned int i, j; - - for (i = 0; i < 32; i++) { - if (!test_bit(i, &priv->used_icb)) - continue; - for (j = 0; j < ICB_REGS_SIZE; j++) - meram_write_icb(priv->base, i, icb_regs[j], - priv->icbs[i].regs[j]); - } - - for (i = 0; i < MERAM_REGS_SIZE; i++) - meram_write_reg(priv->base, common_regs[i], priv->regs[i]); - return 0; -} -#endif /* CONFIG_PM */ - -static UNIVERSAL_DEV_PM_OPS(sh_mobile_meram_dev_pm_ops, - sh_mobile_meram_suspend, - sh_mobile_meram_resume, NULL); - -/* ----------------------------------------------------------------------------- - * Probe/remove and driver init/exit - */ - -static int sh_mobile_meram_probe(struct platform_device *pdev) -{ - struct sh_mobile_meram_priv *priv; - struct sh_mobile_meram_info *pdata = pdev->dev.platform_data; - struct resource *regs; - struct resource *meram; - unsigned int i; - int error; - - if (!pdata) { - dev_err(&pdev->dev, "no platform data defined\n"); - return -EINVAL; - } - - regs = platform_get_resource(pdev, IORESOURCE_MEM, 0); - meram = platform_get_resource(pdev, IORESOURCE_MEM, 1); - if (regs == NULL || meram == NULL) { - dev_err(&pdev->dev, "cannot get platform resources\n"); - return -ENOENT; - } - - priv = kzalloc(sizeof(*priv), GFP_KERNEL); - if (!priv) { - dev_err(&pdev->dev, "cannot allocate device data\n"); - return -ENOMEM; - } - - /* Initialize private data. */ - mutex_init(&priv->lock); - priv->used_icb = pdata->reserved_icbs; - - for (i = 0; i < MERAM_ICB_NUM; ++i) - priv->icbs[i].index = i; - - pdata->priv = priv; - pdata->pdev = pdev; - - /* Request memory regions and remap the registers. */ - if (!request_mem_region(regs->start, resource_size(regs), pdev->name)) { - dev_err(&pdev->dev, "MERAM registers region already claimed\n"); - error = -EBUSY; - goto err_req_regs; - } - - if (!request_mem_region(meram->start, resource_size(meram), - pdev->name)) { - dev_err(&pdev->dev, "MERAM memory region already claimed\n"); - error = -EBUSY; - goto err_req_meram; - } - - priv->base = ioremap_nocache(regs->start, resource_size(regs)); - if (!priv->base) { - dev_err(&pdev->dev, "ioremap failed\n"); - error = -EFAULT; - goto err_ioremap; - } - - priv->meram = meram->start; - - /* Create and initialize the MERAM memory pool. */ - priv->pool = gen_pool_create(ilog2(MERAM_GRANULARITY), -1); - if (priv->pool == NULL) { - error = -ENOMEM; - goto err_genpool; - } - - error = gen_pool_add(priv->pool, meram->start, resource_size(meram), - -1); - if (error < 0) - goto err_genpool; - - /* initialize ICB addressing mode */ - if (pdata->addr_mode == SH_MOBILE_MERAM_MODE1) - meram_write_reg(priv->base, MEVCR1, MEVCR1_AMD1); - - platform_set_drvdata(pdev, priv); - pm_runtime_enable(&pdev->dev); - - dev_info(&pdev->dev, "sh_mobile_meram initialized."); - - return 0; - -err_genpool: - if (priv->pool) - gen_pool_destroy(priv->pool); - iounmap(priv->base); -err_ioremap: - release_mem_region(meram->start, resource_size(meram)); -err_req_meram: - release_mem_region(regs->start, resource_size(regs)); -err_req_regs: - mutex_destroy(&priv->lock); - kfree(priv); - - return error; -} - - -static int sh_mobile_meram_remove(struct platform_device *pdev) -{ - struct sh_mobile_meram_priv *priv = platform_get_drvdata(pdev); - struct resource *regs = platform_get_resource(pdev, IORESOURCE_MEM, 0); - struct resource *meram = platform_get_resource(pdev, IORESOURCE_MEM, 1); - - pm_runtime_disable(&pdev->dev); - - gen_pool_destroy(priv->pool); - - iounmap(priv->base); - release_mem_region(meram->start, resource_size(meram)); - release_mem_region(regs->start, resource_size(regs)); - - mutex_destroy(&priv->lock); - - kfree(priv); - - return 0; -} - -static struct platform_driver sh_mobile_meram_driver = { - .driver = { - .name = "sh_mobile_meram", - .pm = &sh_mobile_meram_dev_pm_ops, - }, - .probe = sh_mobile_meram_probe, - .remove = sh_mobile_meram_remove, -}; - -module_platform_driver(sh_mobile_meram_driver); - -MODULE_DESCRIPTION("SuperH Mobile MERAM driver"); -MODULE_AUTHOR("Damian Hobson-Garcia / Takanari Hayama"); -MODULE_LICENSE("GPL v2"); diff --git a/drivers/video/fbdev/sm501fb.c b/drivers/video/fbdev/sm501fb.c index 6f0a19501c6a..dde52d027416 100644 --- a/drivers/video/fbdev/sm501fb.c +++ b/drivers/video/fbdev/sm501fb.c @@ -1932,8 +1932,7 @@ static int sm501fb_probe(struct platform_device *pdev) int ret; /* allocate our framebuffers */ - - info = kzalloc(sizeof(struct sm501fb_info), GFP_KERNEL); + info = kzalloc(sizeof(*info), GFP_KERNEL); if (!info) { dev_err(dev, "failed to allocate state\n"); return -ENOMEM; diff --git a/drivers/video/fbdev/via/global.h b/drivers/video/fbdev/via/global.h index 275dbbbd6b81..649d2ca5516e 100644 --- a/drivers/video/fbdev/via/global.h +++ b/drivers/video/fbdev/via/global.h @@ -33,6 +33,12 @@ #include <linux/console.h> #include <linux/timer.h> +#ifdef CONFIG_X86 +#include <asm/olpc.h> +#else +#define machine_is_olpc(x) 0 +#endif + #include "debug.h" #include "viafbdev.h" diff --git a/drivers/video/fbdev/via/hw.c b/drivers/video/fbdev/via/hw.c index 22450908306c..48969c644599 100644 --- a/drivers/video/fbdev/via/hw.c +++ b/drivers/video/fbdev/via/hw.c @@ -20,7 +20,6 @@ */ #include <linux/via-core.h> -#include <asm/olpc.h> #include "global.h" #include "via_clock.h" diff --git a/drivers/video/fbdev/via/via-core.c b/drivers/video/fbdev/via/via-core.c index 77774d8abf94..b041eb27a9bf 100644 --- a/drivers/video/fbdev/via/via-core.c +++ b/drivers/video/fbdev/via/via-core.c @@ -17,7 +17,6 @@ #include <linux/platform_device.h> #include <linux/list.h> #include <linux/pm.h> -#include <asm/olpc.h> /* * The default port config. diff --git a/drivers/video/fbdev/via/via_clock.c b/drivers/video/fbdev/via/via_clock.c index bf269fa43977..3d0efdbaea58 100644 --- a/drivers/video/fbdev/via/via_clock.c +++ b/drivers/video/fbdev/via/via_clock.c @@ -25,7 +25,7 @@ #include <linux/kernel.h> #include <linux/via-core.h> -#include <asm/olpc.h> + #include "via_clock.h" #include "global.h" #include "debug.h" diff --git a/drivers/video/fbdev/via/viafbdev.c b/drivers/video/fbdev/via/viafbdev.c index 52f577b0669b..d2f785068ef4 100644 --- a/drivers/video/fbdev/via/viafbdev.c +++ b/drivers/video/fbdev/via/viafbdev.c @@ -25,7 +25,6 @@ #include <linux/stat.h> #include <linux/via-core.h> #include <linux/via_i2c.h> -#include <asm/olpc.h> #define _MASTER_FILE #include "global.h" diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index b563a4499cc8..705aebd74e56 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -578,6 +578,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); struct device *dev = get_device(&vp_dev->vdev.dev); + pci_disable_sriov(pci_dev); + unregister_virtio_device(&vp_dev->vdev); if (vp_dev->ioaddr) @@ -589,6 +591,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) put_device(dev); } +static int virtio_pci_sriov_configure(struct pci_dev *pci_dev, int num_vfs) +{ + struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); + struct virtio_device *vdev = &vp_dev->vdev; + int ret; + + if (!(vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_DRIVER_OK)) + return -EBUSY; + + if (!__virtio_test_bit(vdev, VIRTIO_F_SR_IOV)) + return -EINVAL; + + if (pci_vfs_assigned(pci_dev)) + return -EPERM; + + if (num_vfs == 0) { + pci_disable_sriov(pci_dev); + return 0; + } + + ret = pci_enable_sriov(pci_dev, num_vfs); + if (ret < 0) + return ret; + + return num_vfs; +} + static struct pci_driver virtio_pci_driver = { .name = "virtio-pci", .id_table = virtio_pci_id_table, @@ -597,6 +626,7 @@ static struct pci_driver virtio_pci_driver = { #ifdef CONFIG_PM_SLEEP .driver.pm = &virtio_pci_pm_ops, #endif + .sriov_configure = virtio_pci_sriov_configure, }; module_pci_driver(virtio_pci_driver); diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 2555d80f6eec..07571daccfec 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -153,14 +153,28 @@ static u64 vp_get_features(struct virtio_device *vdev) return features; } +static void vp_transport_features(struct virtio_device *vdev, u64 features) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct pci_dev *pci_dev = vp_dev->pci_dev; + + if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) && + pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV)) + __virtio_set_bit(vdev, VIRTIO_F_SR_IOV); +} + /* virtio config->finalize_features() implementation */ static int vp_finalize_features(struct virtio_device *vdev) { struct virtio_pci_device *vp_dev = to_vp_device(vdev); + u64 features = vdev->features; /* Give virtio_ring a chance to accept features. */ vring_transport_features(vdev); + /* Give virtio_pci a chance to accept features. */ + vp_transport_features(vdev, features); + if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) { dev_err(&vdev->dev, "virtio: device uses modern interface " "but does not have VIRTIO_F_VERSION_1\n"); diff --git a/fs/afs/Makefile b/fs/afs/Makefile index 532acae25453..546874057bd3 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -5,7 +5,7 @@ afs-cache-$(CONFIG_AFS_FSCACHE) := cache.o -kafs-objs := \ +kafs-y := \ $(afs-cache-y) \ addr_list.o \ callback.o \ @@ -21,7 +21,6 @@ kafs-objs := \ main.o \ misc.o \ mntpt.o \ - proc.o \ rotate.o \ rxrpc.o \ security.o \ @@ -34,4 +33,5 @@ kafs-objs := \ write.o \ xattr.o +kafs-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_AFS_FS) := kafs.o diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 2c46c46f3a6d..025a9a5e1c32 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -215,7 +215,7 @@ struct afs_addr_list *afs_dns_query(struct afs_cell *cell, time64_t *_expiry) _enter("%s", cell->name); ret = dns_query("afsdb", cell->name, cell->name_len, - "ipv4", &vllist, _expiry); + "", &vllist, _expiry); if (ret < 0) return ERR_PTR(ret); diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 571437dcb252..5f261fbf2182 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -21,6 +21,66 @@ #include "internal.h" /* + * Create volume and callback interests on a server. + */ +static struct afs_cb_interest *afs_create_interest(struct afs_server *server, + struct afs_vnode *vnode) +{ + struct afs_vol_interest *new_vi, *vi; + struct afs_cb_interest *new; + struct hlist_node **pp; + + new_vi = kzalloc(sizeof(struct afs_vol_interest), GFP_KERNEL); + if (!new_vi) + return NULL; + + new = kzalloc(sizeof(struct afs_cb_interest), GFP_KERNEL); + if (!new) { + kfree(new_vi); + return NULL; + } + + new_vi->usage = 1; + new_vi->vid = vnode->volume->vid; + INIT_HLIST_NODE(&new_vi->srv_link); + INIT_HLIST_HEAD(&new_vi->cb_interests); + + refcount_set(&new->usage, 1); + new->sb = vnode->vfs_inode.i_sb; + new->vid = vnode->volume->vid; + new->server = afs_get_server(server); + INIT_HLIST_NODE(&new->cb_vlink); + + write_lock(&server->cb_break_lock); + + for (pp = &server->cb_volumes.first; *pp; pp = &(*pp)->next) { + vi = hlist_entry(*pp, struct afs_vol_interest, srv_link); + if (vi->vid < new_vi->vid) + continue; + if (vi->vid > new_vi->vid) + break; + vi->usage++; + goto found_vi; + } + + new_vi->srv_link.pprev = pp; + new_vi->srv_link.next = *pp; + if (*pp) + (*pp)->pprev = &new_vi->srv_link.next; + *pp = &new_vi->srv_link; + vi = new_vi; + new_vi = NULL; +found_vi: + + new->vol_interest = vi; + hlist_add_head(&new->cb_vlink, &vi->cb_interests); + + write_unlock(&server->cb_break_lock); + kfree(new_vi); + return new; +} + +/* * Set up an interest-in-callbacks record for a volume on a server and * register it with the server. * - Called with vnode->io_lock held. @@ -77,20 +137,10 @@ again: } if (!cbi) { - new = kzalloc(sizeof(struct afs_cb_interest), GFP_KERNEL); + new = afs_create_interest(server, vnode); if (!new) return -ENOMEM; - refcount_set(&new->usage, 1); - new->sb = vnode->vfs_inode.i_sb; - new->vid = vnode->volume->vid; - new->server = afs_get_server(server); - INIT_LIST_HEAD(&new->cb_link); - - write_lock(&server->cb_break_lock); - list_add_tail(&new->cb_link, &server->cb_interests); - write_unlock(&server->cb_break_lock); - write_lock(&slist->lock); if (!entry->cb_interest) { entry->cb_interest = afs_get_cb_interest(new); @@ -126,11 +176,22 @@ again: */ void afs_put_cb_interest(struct afs_net *net, struct afs_cb_interest *cbi) { + struct afs_vol_interest *vi; + if (cbi && refcount_dec_and_test(&cbi->usage)) { - if (!list_empty(&cbi->cb_link)) { + if (!hlist_unhashed(&cbi->cb_vlink)) { write_lock(&cbi->server->cb_break_lock); - list_del_init(&cbi->cb_link); + + hlist_del_init(&cbi->cb_vlink); + vi = cbi->vol_interest; + cbi->vol_interest = NULL; + if (--vi->usage == 0) + hlist_del(&vi->srv_link); + else + vi = NULL; + write_unlock(&cbi->server->cb_break_lock); + kfree(vi); afs_put_server(net, cbi->server); } kfree(cbi); @@ -182,20 +243,34 @@ void afs_break_callback(struct afs_vnode *vnode) static void afs_break_one_callback(struct afs_server *server, struct afs_fid *fid) { + struct afs_vol_interest *vi; struct afs_cb_interest *cbi; struct afs_iget_data data; struct afs_vnode *vnode; struct inode *inode; read_lock(&server->cb_break_lock); + hlist_for_each_entry(vi, &server->cb_volumes, srv_link) { + if (vi->vid < fid->vid) + continue; + if (vi->vid > fid->vid) { + vi = NULL; + break; + } + //atomic_inc(&vi->usage); + break; + } + + /* TODO: Find all matching volumes if we couldn't match the server and + * break them anyway. + */ + if (!vi) + goto out; /* Step through all interested superblocks. There may be more than one * because of cell aliasing. */ - list_for_each_entry(cbi, &server->cb_interests, cb_link) { - if (cbi->vid != fid->vid) - continue; - + hlist_for_each_entry(cbi, &vi->cb_interests, cb_vlink) { if (fid->vnode == 0 && fid->unique == 0) { /* The callback break applies to an entire volume. */ struct afs_super_info *as = AFS_FS_S(cbi->sb); @@ -217,6 +292,7 @@ static void afs_break_one_callback(struct afs_server *server, } } +out: read_unlock(&server->cb_break_lock); } diff --git a/fs/afs/cell.c b/fs/afs/cell.c index fdf4c36cff79..f3d0bef16d78 100644 --- a/fs/afs/cell.c +++ b/fs/afs/cell.c @@ -15,6 +15,7 @@ #include <linux/dns_resolver.h> #include <linux/sched.h> #include <linux/inet.h> +#include <linux/namei.h> #include <keys/rxrpc-type.h> #include "internal.h" @@ -341,8 +342,8 @@ int afs_cell_init(struct afs_net *net, const char *rootcell) /* install the new cell */ write_seqlock(&net->cells_lock); - old_root = net->ws_cell; - net->ws_cell = new_root; + old_root = rcu_access_pointer(net->ws_cell); + rcu_assign_pointer(net->ws_cell, new_root); write_sequnlock(&net->cells_lock); afs_put_cell(net, old_root); @@ -528,12 +529,14 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell) NULL, 0, cell, 0, true); #endif - ret = afs_proc_cell_setup(net, cell); + ret = afs_proc_cell_setup(cell); if (ret < 0) return ret; - spin_lock(&net->proc_cells_lock); + + mutex_lock(&net->proc_cells_lock); list_add_tail(&cell->proc_link, &net->proc_cells); - spin_unlock(&net->proc_cells_lock); + afs_dynroot_mkdir(net, cell); + mutex_unlock(&net->proc_cells_lock); return 0; } @@ -544,11 +547,12 @@ static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell) { _enter("%s", cell->name); - afs_proc_cell_remove(net, cell); + afs_proc_cell_remove(cell); - spin_lock(&net->proc_cells_lock); + mutex_lock(&net->proc_cells_lock); list_del_init(&cell->proc_link); - spin_unlock(&net->proc_cells_lock); + afs_dynroot_rmdir(net, cell); + mutex_unlock(&net->proc_cells_lock); #ifdef CONFIG_AFS_FSCACHE fscache_relinquish_cookie(cell->cache, NULL, false); @@ -755,8 +759,8 @@ void afs_cell_purge(struct afs_net *net) _enter(""); write_seqlock(&net->cells_lock); - ws = net->ws_cell; - net->ws_cell = NULL; + ws = rcu_access_pointer(net->ws_cell); + RCU_INIT_POINTER(net->ws_cell, NULL); write_sequnlock(&net->cells_lock); afs_put_cell(net, ws); diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 238fd28cfdd2..9e51d6fe7e8f 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -526,7 +526,7 @@ static void SRXAFSCB_TellMeAboutYourself(struct work_struct *work) nifs = 0; ifs = kcalloc(32, sizeof(*ifs), GFP_KERNEL); if (ifs) { - nifs = afs_get_ipv4_interfaces(ifs, 32, false); + nifs = afs_get_ipv4_interfaces(call->net, ifs, 32, false); if (nifs < 0) { kfree(ifs); ifs = NULL; diff --git a/fs/afs/dynroot.c b/fs/afs/dynroot.c index 983f3946ab57..174e843f0633 100644 --- a/fs/afs/dynroot.c +++ b/fs/afs/dynroot.c @@ -1,4 +1,4 @@ -/* dir.c: AFS dynamic root handling +/* AFS dynamic root handling * * Copyright (C) 2018 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) @@ -46,7 +46,7 @@ static int afs_probe_cell_name(struct dentry *dentry) return 0; } - ret = dns_query("afsdb", name, len, "ipv4", NULL, NULL); + ret = dns_query("afsdb", name, len, "", NULL, NULL); if (ret == -ENODATA) ret = -EDESTADDRREQ; return ret; @@ -207,3 +207,125 @@ const struct dentry_operations afs_dynroot_dentry_operations = { .d_release = afs_d_release, .d_automount = afs_d_automount, }; + +/* + * Create a manually added cell mount directory. + * - The caller must hold net->proc_cells_lock + */ +int afs_dynroot_mkdir(struct afs_net *net, struct afs_cell *cell) +{ + struct super_block *sb = net->dynroot_sb; + struct dentry *root, *subdir; + int ret; + + if (!sb || atomic_read(&sb->s_active) == 0) + return 0; + + /* Let the ->lookup op do the creation */ + root = sb->s_root; + inode_lock(root->d_inode); + subdir = lookup_one_len(cell->name, root, cell->name_len); + if (IS_ERR(subdir)) { + ret = PTR_ERR(subdir); + goto unlock; + } + + /* Note that we're retaining an extra ref on the dentry */ + subdir->d_fsdata = (void *)1UL; + ret = 0; +unlock: + inode_unlock(root->d_inode); + return ret; +} + +/* + * Remove a manually added cell mount directory. + * - The caller must hold net->proc_cells_lock + */ +void afs_dynroot_rmdir(struct afs_net *net, struct afs_cell *cell) +{ + struct super_block *sb = net->dynroot_sb; + struct dentry *root, *subdir; + + if (!sb || atomic_read(&sb->s_active) == 0) + return; + + root = sb->s_root; + inode_lock(root->d_inode); + + /* Don't want to trigger a lookup call, which will re-add the cell */ + subdir = try_lookup_one_len(cell->name, root, cell->name_len); + if (IS_ERR_OR_NULL(subdir)) { + _debug("lookup %ld", PTR_ERR(subdir)); + goto no_dentry; + } + + _debug("rmdir %pd %u", subdir, d_count(subdir)); + + if (subdir->d_fsdata) { + _debug("unpin %u", d_count(subdir)); + subdir->d_fsdata = NULL; + dput(subdir); + } + dput(subdir); +no_dentry: + inode_unlock(root->d_inode); + _leave(""); +} + +/* + * Populate a newly created dynamic root with cell names. + */ +int afs_dynroot_populate(struct super_block *sb) +{ + struct afs_cell *cell; + struct afs_net *net = afs_sb2net(sb); + int ret; + + if (mutex_lock_interruptible(&net->proc_cells_lock) < 0) + return -ERESTARTSYS; + + net->dynroot_sb = sb; + list_for_each_entry(cell, &net->proc_cells, proc_link) { + ret = afs_dynroot_mkdir(net, cell); + if (ret < 0) + goto error; + } + + ret = 0; +out: + mutex_unlock(&net->proc_cells_lock); + return ret; + +error: + net->dynroot_sb = NULL; + goto out; +} + +/* + * When a dynamic root that's in the process of being destroyed, depopulate it + * of pinned directories. + */ +void afs_dynroot_depopulate(struct super_block *sb) +{ + struct afs_net *net = afs_sb2net(sb); + struct dentry *root = sb->s_root, *subdir, *tmp; + + /* Prevent more subdirs from being created */ + mutex_lock(&net->proc_cells_lock); + if (net->dynroot_sb == sb) + net->dynroot_sb = NULL; + mutex_unlock(&net->proc_cells_lock); + + inode_lock(root->d_inode); + + /* Remove all the pins for dirs created for manually added cells */ + list_for_each_entry_safe(subdir, tmp, &root->d_subdirs, d_child) { + if (subdir->d_fsdata) { + subdir->d_fsdata = NULL; + dput(subdir); + } + } + + inode_unlock(root->d_inode); +} diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 5907601aafd0..50929cb91732 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -138,10 +138,6 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, u64 data_version, size; u32 type, abort_code; u8 flags = 0; - int ret; - - if (vnode) - write_seqlock(&vnode->cb_lock); abort_code = ntohl(xdr->abort_code); @@ -154,8 +150,7 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, * case. */ status->abort_code = abort_code; - ret = 0; - goto out; + return 0; } pr_warn("Unknown AFSFetchStatus version %u\n", ntohl(xdr->if_version)); @@ -164,8 +159,7 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, if (abort_code != 0 && inline_error) { status->abort_code = abort_code; - ret = 0; - goto out; + return 0; } type = ntohl(xdr->type); @@ -235,17 +229,35 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, flags); } - ret = 0; - -out: - if (vnode) - write_sequnlock(&vnode->cb_lock); - return ret; + return 0; bad: xdr_dump_bad(*_bp); - ret = afs_protocol_error(call, -EBADMSG); - goto out; + return afs_protocol_error(call, -EBADMSG); +} + +/* + * Decode the file status. We need to lock the target vnode if we're going to + * update its status so that stat() sees the attributes update atomically. + */ +static int afs_decode_status(struct afs_call *call, + const __be32 **_bp, + struct afs_file_status *status, + struct afs_vnode *vnode, + const afs_dataversion_t *expected_version, + struct afs_read *read_req) +{ + int ret; + + if (!vnode) + return xdr_decode_AFSFetchStatus(call, _bp, status, vnode, + expected_version, read_req); + + write_seqlock(&vnode->cb_lock); + ret = xdr_decode_AFSFetchStatus(call, _bp, status, vnode, + expected_version, read_req); + write_sequnlock(&vnode->cb_lock); + return ret; } /* @@ -387,8 +399,8 @@ static int afs_deliver_fs_fetch_status_vnode(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (xdr_decode_AFSFetchStatus(call, &bp, &vnode->status, vnode, - &call->expected_version, NULL) < 0) + if (afs_decode_status(call, &bp, &vnode->status, vnode, + &call->expected_version, NULL) < 0) return afs_protocol_error(call, -EBADMSG); xdr_decode_AFSCallBack(call, vnode, &bp); if (call->reply[1]) @@ -568,8 +580,8 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) return ret; bp = call->buffer; - if (xdr_decode_AFSFetchStatus(call, &bp, &vnode->status, vnode, - &vnode->status.data_version, req) < 0) + if (afs_decode_status(call, &bp, &vnode->status, vnode, + &vnode->status.data_version, req) < 0) return afs_protocol_error(call, -EBADMSG); xdr_decode_AFSCallBack(call, vnode, &bp); if (call->reply[1]) @@ -721,9 +733,9 @@ static int afs_deliver_fs_create_vnode(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; xdr_decode_AFSFid(&bp, call->reply[1]); - if (xdr_decode_AFSFetchStatus(call, &bp, call->reply[2], NULL, NULL, NULL) < 0 || - xdr_decode_AFSFetchStatus(call, &bp, &vnode->status, vnode, - &call->expected_version, NULL) < 0) + if (afs_decode_status(call, &bp, call->reply[2], NULL, NULL, NULL) < 0 || + afs_decode_status(call, &bp, &vnode->status, vnode, + &call->expected_version, NULL) < 0) return afs_protocol_error(call, -EBADMSG); xdr_decode_AFSCallBack_raw(&bp, call->reply[3]); /* xdr_decode_AFSVolSync(&bp, call->reply[X]); */ @@ -827,8 +839,8 @@ static int afs_deliver_fs_remove(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (xdr_decode_AFSFetchStatus(call, &bp, &vnode->status, vnode, - &call->expected_version, NULL) < 0) + if (afs_decode_status(call, &bp, &vnode->status, vnode, + &call->expected_version, NULL) < 0) return afs_protocol_error(call, -EBADMSG); /* xdr_decode_AFSVolSync(&bp, call->reply[X]); */ @@ -917,9 +929,9 @@ static int afs_deliver_fs_link(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (xdr_decode_AFSFetchStatus(call, &bp, &vnode->status, vnode, NULL, NULL) < 0 || - xdr_decode_AFSFetchStatus(call, &bp, &dvnode->status, dvnode, - &call->expected_version, NULL) < 0) + if (afs_decode_status(call, &bp, &vnode->status, vnode, NULL, NULL) < 0 || + afs_decode_status(call, &bp, &dvnode->status, dvnode, + &call->expected_version, NULL) < 0) return afs_protocol_error(call, -EBADMSG); /* xdr_decode_AFSVolSync(&bp, call->reply[X]); */ @@ -1004,9 +1016,9 @@ static int afs_deliver_fs_symlink(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; xdr_decode_AFSFid(&bp, call->reply[1]); - if (xdr_decode_AFSFetchStatus(call, &bp, call->reply[2], NULL, NULL, NULL) || - xdr_decode_AFSFetchStatus(call, &bp, &vnode->status, vnode, - &call->expected_version, NULL) < 0) + if (afs_decode_status(call, &bp, call->reply[2], NULL, NULL, NULL) || + afs_decode_status(call, &bp, &vnode->status, vnode, + &call->expected_version, NULL) < 0) return afs_protocol_error(call, -EBADMSG); /* xdr_decode_AFSVolSync(&bp, call->reply[X]); */ @@ -1110,12 +1122,12 @@ static int afs_deliver_fs_rename(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (xdr_decode_AFSFetchStatus(call, &bp, &orig_dvnode->status, orig_dvnode, - &call->expected_version, NULL) < 0) + if (afs_decode_status(call, &bp, &orig_dvnode->status, orig_dvnode, + &call->expected_version, NULL) < 0) return afs_protocol_error(call, -EBADMSG); if (new_dvnode != orig_dvnode && - xdr_decode_AFSFetchStatus(call, &bp, &new_dvnode->status, new_dvnode, - &call->expected_version_2, NULL) < 0) + afs_decode_status(call, &bp, &new_dvnode->status, new_dvnode, + &call->expected_version_2, NULL) < 0) return afs_protocol_error(call, -EBADMSG); /* xdr_decode_AFSVolSync(&bp, call->reply[X]); */ @@ -1219,8 +1231,8 @@ static int afs_deliver_fs_store_data(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (xdr_decode_AFSFetchStatus(call, &bp, &vnode->status, vnode, - &call->expected_version, NULL) < 0) + if (afs_decode_status(call, &bp, &vnode->status, vnode, + &call->expected_version, NULL) < 0) return afs_protocol_error(call, -EBADMSG); /* xdr_decode_AFSVolSync(&bp, call->reply[X]); */ @@ -1395,8 +1407,8 @@ static int afs_deliver_fs_store_status(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - if (xdr_decode_AFSFetchStatus(call, &bp, &vnode->status, vnode, - &call->expected_version, NULL) < 0) + if (afs_decode_status(call, &bp, &vnode->status, vnode, + &call->expected_version, NULL) < 0) return afs_protocol_error(call, -EBADMSG); /* xdr_decode_AFSVolSync(&bp, call->reply[X]); */ @@ -2097,8 +2109,8 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) /* unmarshall the reply once we've received all of it */ bp = call->buffer; - xdr_decode_AFSFetchStatus(call, &bp, status, vnode, - &call->expected_version, NULL); + afs_decode_status(call, &bp, status, vnode, + &call->expected_version, NULL); callback[call->count].version = ntohl(bp[0]); callback[call->count].expiry = ntohl(bp[1]); callback[call->count].type = ntohl(bp[2]); @@ -2209,9 +2221,9 @@ static int afs_deliver_fs_inline_bulk_status(struct afs_call *call) bp = call->buffer; statuses = call->reply[1]; - if (xdr_decode_AFSFetchStatus(call, &bp, &statuses[call->count], - call->count == 0 ? vnode : NULL, - NULL, NULL) < 0) + if (afs_decode_status(call, &bp, &statuses[call->count], + call->count == 0 ? vnode : NULL, + NULL, NULL) < 0) return afs_protocol_error(call, -EBADMSG); call->count++; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index e3f8a46663db..9778df135717 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -22,6 +22,8 @@ #include <linux/backing-dev.h> #include <linux/uuid.h> #include <net/net_namespace.h> +#include <net/netns/generic.h> +#include <net/sock.h> #include <net/af_rxrpc.h> #include "afs.h" @@ -40,7 +42,8 @@ struct afs_mount_params { afs_voltype_t type; /* type of volume requested */ int volnamesz; /* size of volume name */ const char *volname; /* name of volume to mount */ - struct afs_net *net; /* Network namespace in effect */ + struct net *net_ns; /* Network namespace in effect */ + struct afs_net *net; /* the AFS net namespace stuff */ struct afs_cell *cell; /* cell in which to find volume */ struct afs_volume *volume; /* volume record */ struct key *key; /* key to use for secure mounting */ @@ -189,7 +192,7 @@ struct afs_read { * - there's one superblock per volume */ struct afs_super_info { - struct afs_net *net; /* Network namespace */ + struct net *net_ns; /* Network namespace */ struct afs_cell *cell; /* The cell in which the volume resides */ struct afs_volume *volume; /* volume record */ bool dyn_root; /* True if dynamic root */ @@ -210,7 +213,6 @@ struct afs_sysnames { char *subs[AFS_NR_SYSNAME]; refcount_t usage; unsigned short nr; - short error; char blank[1]; }; @@ -218,6 +220,7 @@ struct afs_sysnames { * AFS network namespace record. */ struct afs_net { + struct net *net; /* Backpointer to the owning net namespace */ struct afs_uuid uuid; bool live; /* F if this namespace is being removed */ @@ -231,13 +234,13 @@ struct afs_net { /* Cell database */ struct rb_root cells; - struct afs_cell *ws_cell; + struct afs_cell __rcu *ws_cell; struct work_struct cells_manager; struct timer_list cells_timer; atomic_t cells_outstanding; seqlock_t cells_lock; - spinlock_t proc_cells_lock; + struct mutex proc_cells_lock; struct list_head proc_cells; /* Known servers. Theoretically each fileserver can only be in one @@ -261,6 +264,7 @@ struct afs_net { struct mutex lock_manager_mutex; /* Misc */ + struct super_block *dynroot_sb; /* Dynamic root mount superblock */ struct proc_dir_entry *proc_afs; /* /proc/net/afs directory */ struct afs_sysnames *sysnames; rwlock_t sysnames_lock; @@ -280,7 +284,6 @@ struct afs_net { }; extern const char afs_init_sysname[]; -extern struct afs_net __afs_net;// Dummy AFS network namespace; TODO: replace with real netns enum afs_cell_state { AFS_CELL_UNSET, @@ -404,16 +407,27 @@ struct afs_server { rwlock_t fs_lock; /* access lock */ /* callback promise management */ - struct list_head cb_interests; /* List of superblocks using this server */ + struct hlist_head cb_volumes; /* List of volume interests on this server */ unsigned cb_s_break; /* Break-everything counter. */ rwlock_t cb_break_lock; /* Volume finding lock */ }; /* + * Volume collation in the server's callback interest list. + */ +struct afs_vol_interest { + struct hlist_node srv_link; /* Link in server->cb_volumes */ + struct hlist_head cb_interests; /* List of callback interests on the server */ + afs_volid_t vid; /* Volume ID to match */ + unsigned int usage; +}; + +/* * Interest by a superblock on a server. */ struct afs_cb_interest { - struct list_head cb_link; /* Link in server->cb_interests */ + struct hlist_node cb_vlink; /* Link in vol_interest->cb_interests */ + struct afs_vol_interest *vol_interest; struct afs_server *server; /* Server on which this interest resides */ struct super_block *sb; /* Superblock on which inodes reside */ afs_volid_t vid; /* Volume ID to match */ @@ -720,6 +734,10 @@ extern const struct inode_operations afs_dynroot_inode_operations; extern const struct dentry_operations afs_dynroot_dentry_operations; extern struct inode *afs_try_auto_mntpt(struct dentry *, struct inode *); +extern int afs_dynroot_mkdir(struct afs_net *, struct afs_cell *); +extern void afs_dynroot_rmdir(struct afs_net *, struct afs_cell *); +extern int afs_dynroot_populate(struct super_block *); +extern void afs_dynroot_depopulate(struct super_block *); /* * file.c @@ -806,34 +824,36 @@ extern int afs_drop_inode(struct inode *); * main.c */ extern struct workqueue_struct *afs_wq; +extern int afs_net_id; -static inline struct afs_net *afs_d2net(struct dentry *dentry) +static inline struct afs_net *afs_net(struct net *net) { - return &__afs_net; + return net_generic(net, afs_net_id); } -static inline struct afs_net *afs_i2net(struct inode *inode) +static inline struct afs_net *afs_sb2net(struct super_block *sb) { - return &__afs_net; + return afs_net(AFS_FS_S(sb)->net_ns); } -static inline struct afs_net *afs_v2net(struct afs_vnode *vnode) +static inline struct afs_net *afs_d2net(struct dentry *dentry) { - return &__afs_net; + return afs_sb2net(dentry->d_sb); } -static inline struct afs_net *afs_sock2net(struct sock *sk) +static inline struct afs_net *afs_i2net(struct inode *inode) { - return &__afs_net; + return afs_sb2net(inode->i_sb); } -static inline struct afs_net *afs_get_net(struct afs_net *net) +static inline struct afs_net *afs_v2net(struct afs_vnode *vnode) { - return net; + return afs_i2net(&vnode->vfs_inode); } -static inline void afs_put_net(struct afs_net *net) +static inline struct afs_net *afs_sock2net(struct sock *sk) { + return net_generic(sock_net(sk), afs_net_id); } static inline void __afs_stat(atomic_t *s) @@ -861,16 +881,25 @@ extern void afs_mntpt_kill_timer(void); /* * netdevices.c */ -extern int afs_get_ipv4_interfaces(struct afs_interface *, size_t, bool); +extern int afs_get_ipv4_interfaces(struct afs_net *, struct afs_interface *, + size_t, bool); /* * proc.c */ +#ifdef CONFIG_PROC_FS extern int __net_init afs_proc_init(struct afs_net *); extern void __net_exit afs_proc_cleanup(struct afs_net *); -extern int afs_proc_cell_setup(struct afs_net *, struct afs_cell *); -extern void afs_proc_cell_remove(struct afs_net *, struct afs_cell *); +extern int afs_proc_cell_setup(struct afs_cell *); +extern void afs_proc_cell_remove(struct afs_cell *); extern void afs_put_sysnames(struct afs_sysnames *); +#else +static inline int afs_proc_init(struct afs_net *net) { return 0; } +static inline void afs_proc_cleanup(struct afs_net *net) {} +static inline int afs_proc_cell_setup(struct afs_cell *cell) { return 0; } +static inline void afs_proc_cell_remove(struct afs_cell *cell) {} +static inline void afs_put_sysnames(struct afs_sysnames *sysnames) {} +#endif /* * rotate.c @@ -1002,7 +1031,7 @@ extern bool afs_annotate_server_list(struct afs_server_list *, struct afs_server * super.c */ extern int __init afs_fs_init(void); -extern void __exit afs_fs_exit(void); +extern void afs_fs_exit(void); /* * vlclient.c diff --git a/fs/afs/main.c b/fs/afs/main.c index d7560168b3bf..e84fe822a960 100644 --- a/fs/afs/main.c +++ b/fs/afs/main.c @@ -15,6 +15,7 @@ #include <linux/completion.h> #include <linux/sched.h> #include <linux/random.h> +#include <linux/proc_fs.h> #define CREATE_TRACE_POINTS #include "internal.h" @@ -32,7 +33,7 @@ module_param(rootcell, charp, 0); MODULE_PARM_DESC(rootcell, "root AFS cell name and VL server IP addr list"); struct workqueue_struct *afs_wq; -struct afs_net __afs_net; +static struct proc_dir_entry *afs_proc_symlink; #if defined(CONFIG_ALPHA) const char afs_init_sysname[] = "alpha_linux26"; @@ -67,11 +68,13 @@ const char afs_init_sysname[] = "unknown_linux26"; /* * Initialise an AFS network namespace record. */ -static int __net_init afs_net_init(struct afs_net *net) +static int __net_init afs_net_init(struct net *net_ns) { struct afs_sysnames *sysnames; + struct afs_net *net = afs_net(net_ns); int ret; + net->net = net_ns; net->live = true; generate_random_uuid((unsigned char *)&net->uuid); @@ -83,7 +86,7 @@ static int __net_init afs_net_init(struct afs_net *net) INIT_WORK(&net->cells_manager, afs_manage_cells); timer_setup(&net->cells_timer, afs_cells_timer, 0); - spin_lock_init(&net->proc_cells_lock); + mutex_init(&net->proc_cells_lock); INIT_LIST_HEAD(&net->proc_cells); seqlock_init(&net->fs_lock); @@ -142,8 +145,10 @@ error_sysnames: /* * Clean up and destroy an AFS network namespace record. */ -static void __net_exit afs_net_exit(struct afs_net *net) +static void __net_exit afs_net_exit(struct net *net_ns) { + struct afs_net *net = afs_net(net_ns); + net->live = false; afs_cell_purge(net); afs_purge_servers(net); @@ -152,6 +157,13 @@ static void __net_exit afs_net_exit(struct afs_net *net) afs_put_sysnames(net->sysnames); } +static struct pernet_operations afs_net_ops = { + .init = afs_net_init, + .exit = afs_net_exit, + .id = &afs_net_id, + .size = sizeof(struct afs_net), +}; + /* * initialise the AFS client FS module */ @@ -178,7 +190,7 @@ static int __init afs_init(void) goto error_cache; #endif - ret = afs_net_init(&__afs_net); + ret = register_pernet_subsys(&afs_net_ops); if (ret < 0) goto error_net; @@ -187,10 +199,18 @@ static int __init afs_init(void) if (ret < 0) goto error_fs; + afs_proc_symlink = proc_symlink("fs/afs", NULL, "../self/net/afs"); + if (IS_ERR(afs_proc_symlink)) { + ret = PTR_ERR(afs_proc_symlink); + goto error_proc; + } + return ret; +error_proc: + afs_fs_exit(); error_fs: - afs_net_exit(&__afs_net); + unregister_pernet_subsys(&afs_net_ops); error_net: #ifdef CONFIG_AFS_FSCACHE fscache_unregister_netfs(&afs_cache_netfs); @@ -219,8 +239,9 @@ static void __exit afs_exit(void) { printk(KERN_INFO "kAFS: Red Hat AFS client v0.1 unregistering.\n"); + proc_remove(afs_proc_symlink); afs_fs_exit(); - afs_net_exit(&__afs_net); + unregister_pernet_subsys(&afs_net_ops); #ifdef CONFIG_AFS_FSCACHE fscache_unregister_netfs(&afs_cache_netfs); #endif diff --git a/fs/afs/netdevices.c b/fs/afs/netdevices.c index 50bd5bb1c4fb..2a009d1939d7 100644 --- a/fs/afs/netdevices.c +++ b/fs/afs/netdevices.c @@ -17,8 +17,8 @@ * - maxbufs must be at least 1 * - returns the number of interface records in the buffer */ -int afs_get_ipv4_interfaces(struct afs_interface *bufs, size_t maxbufs, - bool wantloopback) +int afs_get_ipv4_interfaces(struct afs_net *net, struct afs_interface *bufs, + size_t maxbufs, bool wantloopback) { struct net_device *dev; struct in_device *idev; @@ -27,7 +27,7 @@ int afs_get_ipv4_interfaces(struct afs_interface *bufs, size_t maxbufs, ASSERT(maxbufs > 0); rtnl_lock(); - for_each_netdev(&init_net, dev) { + for_each_netdev(net->net, dev) { if (dev->type == ARPHRD_LOOPBACK && !wantloopback) continue; idev = __in_dev_get_rtnl(dev); diff --git a/fs/afs/proc.c b/fs/afs/proc.c index 3aad32762989..0c3285c8db95 100644 --- a/fs/afs/proc.c +++ b/fs/afs/proc.c @@ -17,240 +17,78 @@ #include <linux/uaccess.h> #include "internal.h" -static inline struct afs_net *afs_proc2net(struct file *f) +static inline struct afs_net *afs_seq2net(struct seq_file *m) { - return &__afs_net; + return afs_net(seq_file_net(m)); } -static inline struct afs_net *afs_seq2net(struct seq_file *m) +static inline struct afs_net *afs_seq2net_single(struct seq_file *m) { - return &__afs_net; // TODO: use seq_file_net(m) + return afs_net(seq_file_single_net(m)); } -static int afs_proc_cells_open(struct inode *inode, struct file *file); -static void *afs_proc_cells_start(struct seq_file *p, loff_t *pos); -static void *afs_proc_cells_next(struct seq_file *p, void *v, loff_t *pos); -static void afs_proc_cells_stop(struct seq_file *p, void *v); -static int afs_proc_cells_show(struct seq_file *m, void *v); -static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf, - size_t size, loff_t *_pos); - -static const struct seq_operations afs_proc_cells_ops = { - .start = afs_proc_cells_start, - .next = afs_proc_cells_next, - .stop = afs_proc_cells_stop, - .show = afs_proc_cells_show, -}; - -static const struct file_operations afs_proc_cells_fops = { - .open = afs_proc_cells_open, - .read = seq_read, - .write = afs_proc_cells_write, - .llseek = seq_lseek, - .release = seq_release, -}; - -static ssize_t afs_proc_rootcell_read(struct file *file, char __user *buf, - size_t size, loff_t *_pos); -static ssize_t afs_proc_rootcell_write(struct file *file, - const char __user *buf, - size_t size, loff_t *_pos); - -static const struct file_operations afs_proc_rootcell_fops = { - .read = afs_proc_rootcell_read, - .write = afs_proc_rootcell_write, - .llseek = no_llseek, -}; - -static void *afs_proc_cell_volumes_start(struct seq_file *p, loff_t *pos); -static void *afs_proc_cell_volumes_next(struct seq_file *p, void *v, - loff_t *pos); -static void afs_proc_cell_volumes_stop(struct seq_file *p, void *v); -static int afs_proc_cell_volumes_show(struct seq_file *m, void *v); - -static const struct seq_operations afs_proc_cell_volumes_ops = { - .start = afs_proc_cell_volumes_start, - .next = afs_proc_cell_volumes_next, - .stop = afs_proc_cell_volumes_stop, - .show = afs_proc_cell_volumes_show, -}; - -static void *afs_proc_cell_vlservers_start(struct seq_file *p, loff_t *pos); -static void *afs_proc_cell_vlservers_next(struct seq_file *p, void *v, - loff_t *pos); -static void afs_proc_cell_vlservers_stop(struct seq_file *p, void *v); -static int afs_proc_cell_vlservers_show(struct seq_file *m, void *v); - -static const struct seq_operations afs_proc_cell_vlservers_ops = { - .start = afs_proc_cell_vlservers_start, - .next = afs_proc_cell_vlservers_next, - .stop = afs_proc_cell_vlservers_stop, - .show = afs_proc_cell_vlservers_show, -}; - -static void *afs_proc_servers_start(struct seq_file *p, loff_t *pos); -static void *afs_proc_servers_next(struct seq_file *p, void *v, - loff_t *pos); -static void afs_proc_servers_stop(struct seq_file *p, void *v); -static int afs_proc_servers_show(struct seq_file *m, void *v); - -static const struct seq_operations afs_proc_servers_ops = { - .start = afs_proc_servers_start, - .next = afs_proc_servers_next, - .stop = afs_proc_servers_stop, - .show = afs_proc_servers_show, -}; - -static int afs_proc_sysname_open(struct inode *inode, struct file *file); -static int afs_proc_sysname_release(struct inode *inode, struct file *file); -static void *afs_proc_sysname_start(struct seq_file *p, loff_t *pos); -static void *afs_proc_sysname_next(struct seq_file *p, void *v, - loff_t *pos); -static void afs_proc_sysname_stop(struct seq_file *p, void *v); -static int afs_proc_sysname_show(struct seq_file *m, void *v); -static ssize_t afs_proc_sysname_write(struct file *file, - const char __user *buf, - size_t size, loff_t *_pos); - -static const struct seq_operations afs_proc_sysname_ops = { - .start = afs_proc_sysname_start, - .next = afs_proc_sysname_next, - .stop = afs_proc_sysname_stop, - .show = afs_proc_sysname_show, -}; - -static const struct file_operations afs_proc_sysname_fops = { - .open = afs_proc_sysname_open, - .read = seq_read, - .llseek = seq_lseek, - .release = afs_proc_sysname_release, - .write = afs_proc_sysname_write, -}; - -static int afs_proc_stats_show(struct seq_file *m, void *v); - /* - * initialise the /proc/fs/afs/ directory + * Display the list of cells known to the namespace. */ -int afs_proc_init(struct afs_net *net) +static int afs_proc_cells_show(struct seq_file *m, void *v) { - _enter(""); - - net->proc_afs = proc_mkdir("fs/afs", NULL); - if (!net->proc_afs) - goto error_dir; + struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link); + struct afs_net *net = afs_seq2net(m); - if (!proc_create("cells", 0644, net->proc_afs, &afs_proc_cells_fops) || - !proc_create("rootcell", 0644, net->proc_afs, &afs_proc_rootcell_fops) || - !proc_create_seq("servers", 0644, net->proc_afs, &afs_proc_servers_ops) || - !proc_create_single("stats", 0644, net->proc_afs, afs_proc_stats_show) || - !proc_create("sysname", 0644, net->proc_afs, &afs_proc_sysname_fops)) - goto error_tree; + if (v == &net->proc_cells) { + /* display header on line 1 */ + seq_puts(m, "USE NAME\n"); + return 0; + } - _leave(" = 0"); + /* display one cell per line on subsequent lines */ + seq_printf(m, "%3u %s\n", atomic_read(&cell->usage), cell->name); return 0; - -error_tree: - proc_remove(net->proc_afs); -error_dir: - _leave(" = -ENOMEM"); - return -ENOMEM; -} - -/* - * clean up the /proc/fs/afs/ directory - */ -void afs_proc_cleanup(struct afs_net *net) -{ - proc_remove(net->proc_afs); - net->proc_afs = NULL; -} - -/* - * open "/proc/fs/afs/cells" which provides a summary of extant cells - */ -static int afs_proc_cells_open(struct inode *inode, struct file *file) -{ - return seq_open(file, &afs_proc_cells_ops); } -/* - * set up the iterator to start reading from the cells list and return the - * first item - */ static void *afs_proc_cells_start(struct seq_file *m, loff_t *_pos) __acquires(rcu) { - struct afs_net *net = afs_seq2net(m); - rcu_read_lock(); - return seq_list_start_head(&net->proc_cells, *_pos); + return seq_list_start_head(&afs_seq2net(m)->proc_cells, *_pos); } -/* - * move to next cell in cells list - */ static void *afs_proc_cells_next(struct seq_file *m, void *v, loff_t *pos) { - struct afs_net *net = afs_seq2net(m); - - return seq_list_next(v, &net->proc_cells, pos); + return seq_list_next(v, &afs_seq2net(m)->proc_cells, pos); } -/* - * clean up after reading from the cells list - */ static void afs_proc_cells_stop(struct seq_file *m, void *v) __releases(rcu) { rcu_read_unlock(); } -/* - * display a header line followed by a load of cell lines - */ -static int afs_proc_cells_show(struct seq_file *m, void *v) -{ - struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link); - struct afs_net *net = afs_seq2net(m); - - if (v == &net->proc_cells) { - /* display header on line 1 */ - seq_puts(m, "USE NAME\n"); - return 0; - } - - /* display one cell per line on subsequent lines */ - seq_printf(m, "%3u %s\n", atomic_read(&cell->usage), cell->name); - return 0; -} +static const struct seq_operations afs_proc_cells_ops = { + .start = afs_proc_cells_start, + .next = afs_proc_cells_next, + .stop = afs_proc_cells_stop, + .show = afs_proc_cells_show, +}; /* * handle writes to /proc/fs/afs/cells * - to add cells: echo "add <cellname> <IP>[:<IP>][:<IP>]" */ -static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf, - size_t size, loff_t *_pos) +static int afs_proc_cells_write(struct file *file, char *buf, size_t size) { - struct afs_net *net = afs_proc2net(file); - char *kbuf, *name, *args; + struct seq_file *m = file->private_data; + struct afs_net *net = afs_seq2net(m); + char *name, *args; int ret; - /* start by dragging the command into memory */ - if (size <= 1 || size >= PAGE_SIZE) - return -EINVAL; - - kbuf = memdup_user_nul(buf, size); - if (IS_ERR(kbuf)) - return PTR_ERR(kbuf); - /* trim to first NL */ - name = memchr(kbuf, '\n', size); + name = memchr(buf, '\n', size); if (name) *name = 0; /* split into command, name and argslist */ - name = strchr(kbuf, ' '); + name = strchr(buf, ' '); if (!name) goto inval; do { @@ -269,9 +107,9 @@ static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf, goto inval; /* determine command to perform */ - _debug("cmd=%s name=%s args=%s", kbuf, name, args); + _debug("cmd=%s name=%s args=%s", buf, name, args); - if (strcmp(kbuf, "add") == 0) { + if (strcmp(buf, "add") == 0) { struct afs_cell *cell; cell = afs_lookup_cell(net, name, strlen(name), args, true); @@ -287,10 +125,9 @@ static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf, goto inval; } - ret = size; + ret = 0; done: - kfree(kbuf); _leave(" = %d", ret); return ret; @@ -300,200 +137,136 @@ inval: goto done; } -static ssize_t afs_proc_rootcell_read(struct file *file, char __user *buf, - size_t size, loff_t *_pos) +/* + * Display the name of the current workstation cell. + */ +static int afs_proc_rootcell_show(struct seq_file *m, void *v) { struct afs_cell *cell; - struct afs_net *net = afs_proc2net(file); - unsigned int seq = 0; - char name[AFS_MAXCELLNAME + 1]; - int len; - - if (*_pos > 0) - return 0; - if (!net->ws_cell) - return 0; - - rcu_read_lock(); - do { - read_seqbegin_or_lock(&net->cells_lock, &seq); - len = 0; - cell = rcu_dereference_raw(net->ws_cell); - if (cell) { - len = cell->name_len; - memcpy(name, cell->name, len); - } - } while (need_seqretry(&net->cells_lock, seq)); - done_seqretry(&net->cells_lock, seq); - rcu_read_unlock(); - - if (!len) - return 0; - - name[len++] = '\n'; - if (len > size) - len = size; - if (copy_to_user(buf, name, len) != 0) - return -EFAULT; - *_pos = 1; - return len; + struct afs_net *net; + + net = afs_seq2net_single(m); + if (rcu_access_pointer(net->ws_cell)) { + rcu_read_lock(); + cell = rcu_dereference(net->ws_cell); + if (cell) + seq_printf(m, "%s\n", cell->name); + rcu_read_unlock(); + } + return 0; } /* - * handle writes to /proc/fs/afs/rootcell - * - to initialize rootcell: echo "cell.name:192.168.231.14" + * Set the current workstation cell and optionally supply its list of volume + * location servers. + * + * echo "cell.name:192.168.231.14" >/proc/fs/afs/rootcell */ -static ssize_t afs_proc_rootcell_write(struct file *file, - const char __user *buf, - size_t size, loff_t *_pos) +static int afs_proc_rootcell_write(struct file *file, char *buf, size_t size) { - struct afs_net *net = afs_proc2net(file); - char *kbuf, *s; + struct seq_file *m = file->private_data; + struct afs_net *net = afs_seq2net_single(m); + char *s; int ret; - /* start by dragging the command into memory */ - if (size <= 1 || size >= PAGE_SIZE) - return -EINVAL; - - kbuf = memdup_user_nul(buf, size); - if (IS_ERR(kbuf)) - return PTR_ERR(kbuf); - ret = -EINVAL; - if (kbuf[0] == '.') + if (buf[0] == '.') goto out; - if (memchr(kbuf, '/', size)) + if (memchr(buf, '/', size)) goto out; /* trim to first NL */ - s = memchr(kbuf, '\n', size); + s = memchr(buf, '\n', size); if (s) *s = 0; /* determine command to perform */ - _debug("rootcell=%s", kbuf); + _debug("rootcell=%s", buf); - ret = afs_cell_init(net, kbuf); - if (ret >= 0) - ret = size; /* consume everything, always */ + ret = afs_cell_init(net, buf); out: - kfree(kbuf); _leave(" = %d", ret); return ret; } +static const char afs_vol_types[3][3] = { + [AFSVL_RWVOL] = "RW", + [AFSVL_ROVOL] = "RO", + [AFSVL_BACKVOL] = "BK", +}; + /* - * initialise /proc/fs/afs/<cell>/ + * Display the list of volumes known to a cell. */ -int afs_proc_cell_setup(struct afs_net *net, struct afs_cell *cell) +static int afs_proc_cell_volumes_show(struct seq_file *m, void *v) { - struct proc_dir_entry *dir; - - _enter("%p{%s},%p", cell, cell->name, net->proc_afs); + struct afs_cell *cell = PDE_DATA(file_inode(m->file)); + struct afs_volume *vol = list_entry(v, struct afs_volume, proc_link); - dir = proc_mkdir(cell->name, net->proc_afs); - if (!dir) - goto error_dir; + /* Display header on line 1 */ + if (v == &cell->proc_volumes) { + seq_puts(m, "USE VID TY\n"); + return 0; + } - if (!proc_create_seq_data("vlservers", 0, dir, - &afs_proc_cell_vlservers_ops, cell)) - goto error_tree; - if (!proc_create_seq_data("volumes", 0, dir, &afs_proc_cell_volumes_ops, - cell)) - goto error_tree; + seq_printf(m, "%3d %08x %s\n", + atomic_read(&vol->usage), vol->vid, + afs_vol_types[vol->type]); - _leave(" = 0"); return 0; - -error_tree: - remove_proc_subtree(cell->name, net->proc_afs); -error_dir: - _leave(" = -ENOMEM"); - return -ENOMEM; } -/* - * remove /proc/fs/afs/<cell>/ - */ -void afs_proc_cell_remove(struct afs_net *net, struct afs_cell *cell) -{ - _enter(""); - - remove_proc_subtree(cell->name, net->proc_afs); - - _leave(""); -} - -/* - * set up the iterator to start reading from the cells list and return the - * first item - */ static void *afs_proc_cell_volumes_start(struct seq_file *m, loff_t *_pos) __acquires(cell->proc_lock) { struct afs_cell *cell = PDE_DATA(file_inode(m->file)); - _enter("cell=%p pos=%Ld", cell, *_pos); - read_lock(&cell->proc_lock); return seq_list_start_head(&cell->proc_volumes, *_pos); } -/* - * move to next cell in cells list - */ -static void *afs_proc_cell_volumes_next(struct seq_file *p, void *v, +static void *afs_proc_cell_volumes_next(struct seq_file *m, void *v, loff_t *_pos) { - struct afs_cell *cell = PDE_DATA(file_inode(p->file)); + struct afs_cell *cell = PDE_DATA(file_inode(m->file)); - _enter("cell=%p pos=%Ld", cell, *_pos); return seq_list_next(v, &cell->proc_volumes, _pos); } -/* - * clean up after reading from the cells list - */ -static void afs_proc_cell_volumes_stop(struct seq_file *p, void *v) +static void afs_proc_cell_volumes_stop(struct seq_file *m, void *v) __releases(cell->proc_lock) { - struct afs_cell *cell = PDE_DATA(file_inode(p->file)); + struct afs_cell *cell = PDE_DATA(file_inode(m->file)); read_unlock(&cell->proc_lock); } -static const char afs_vol_types[3][3] = { - [AFSVL_RWVOL] = "RW", - [AFSVL_ROVOL] = "RO", - [AFSVL_BACKVOL] = "BK", +static const struct seq_operations afs_proc_cell_volumes_ops = { + .start = afs_proc_cell_volumes_start, + .next = afs_proc_cell_volumes_next, + .stop = afs_proc_cell_volumes_stop, + .show = afs_proc_cell_volumes_show, }; /* - * display a header line followed by a load of volume lines + * Display the list of Volume Location servers we're using for a cell. */ -static int afs_proc_cell_volumes_show(struct seq_file *m, void *v) +static int afs_proc_cell_vlservers_show(struct seq_file *m, void *v) { - struct afs_cell *cell = PDE_DATA(file_inode(m->file)); - struct afs_volume *vol = list_entry(v, struct afs_volume, proc_link); + struct sockaddr_rxrpc *addr = v; - /* Display header on line 1 */ - if (v == &cell->proc_volumes) { - seq_puts(m, "USE VID TY\n"); + /* display header on line 1 */ + if (v == (void *)1) { + seq_puts(m, "ADDRESS\n"); return 0; } - seq_printf(m, "%3d %08x %s\n", - atomic_read(&vol->usage), vol->vid, - afs_vol_types[vol->type]); - + /* display one cell per line on subsequent lines */ + seq_printf(m, "%pISp\n", &addr->transport); return 0; } -/* - * set up the iterator to start reading from the cells list and return the - * first item - */ static void *afs_proc_cell_vlservers_start(struct seq_file *m, loff_t *_pos) __acquires(rcu) { @@ -516,14 +289,11 @@ static void *afs_proc_cell_vlservers_start(struct seq_file *m, loff_t *_pos) return alist->addrs + pos; } -/* - * move to next cell in cells list - */ -static void *afs_proc_cell_vlservers_next(struct seq_file *p, void *v, +static void *afs_proc_cell_vlservers_next(struct seq_file *m, void *v, loff_t *_pos) { struct afs_addr_list *alist; - struct afs_cell *cell = PDE_DATA(file_inode(p->file)); + struct afs_cell *cell = PDE_DATA(file_inode(m->file)); loff_t pos; alist = rcu_dereference(cell->vl_addrs); @@ -536,161 +306,145 @@ static void *afs_proc_cell_vlservers_next(struct seq_file *p, void *v, return alist->addrs + pos; } -/* - * clean up after reading from the cells list - */ -static void afs_proc_cell_vlservers_stop(struct seq_file *p, void *v) +static void afs_proc_cell_vlservers_stop(struct seq_file *m, void *v) __releases(rcu) { rcu_read_unlock(); } +static const struct seq_operations afs_proc_cell_vlservers_ops = { + .start = afs_proc_cell_vlservers_start, + .next = afs_proc_cell_vlservers_next, + .stop = afs_proc_cell_vlservers_stop, + .show = afs_proc_cell_vlservers_show, +}; + /* - * display a header line followed by a load of volume lines + * Display the list of fileservers we're using within a namespace. */ -static int afs_proc_cell_vlservers_show(struct seq_file *m, void *v) +static int afs_proc_servers_show(struct seq_file *m, void *v) { - struct sockaddr_rxrpc *addr = v; + struct afs_server *server; + struct afs_addr_list *alist; + int i; - /* display header on line 1 */ - if (v == (void *)1) { - seq_puts(m, "ADDRESS\n"); + if (v == SEQ_START_TOKEN) { + seq_puts(m, "UUID USE ADDR\n"); return 0; } - /* display one cell per line on subsequent lines */ - seq_printf(m, "%pISp\n", &addr->transport); + server = list_entry(v, struct afs_server, proc_link); + alist = rcu_dereference(server->addresses); + seq_printf(m, "%pU %3d %pISpc%s\n", + &server->uuid, + atomic_read(&server->usage), + &alist->addrs[0].transport, + alist->index == 0 ? "*" : ""); + for (i = 1; i < alist->nr_addrs; i++) + seq_printf(m, " %pISpc%s\n", + &alist->addrs[i].transport, + alist->index == i ? "*" : ""); return 0; } -/* - * Set up the iterator to start reading from the server list and return the - * first item. - */ static void *afs_proc_servers_start(struct seq_file *m, loff_t *_pos) __acquires(rcu) { - struct afs_net *net = afs_seq2net(m); - rcu_read_lock(); - return seq_hlist_start_head_rcu(&net->fs_proc, *_pos); + return seq_hlist_start_head_rcu(&afs_seq2net(m)->fs_proc, *_pos); } -/* - * move to next cell in cells list - */ static void *afs_proc_servers_next(struct seq_file *m, void *v, loff_t *_pos) { - struct afs_net *net = afs_seq2net(m); - - return seq_hlist_next_rcu(v, &net->fs_proc, _pos); + return seq_hlist_next_rcu(v, &afs_seq2net(m)->fs_proc, _pos); } -/* - * clean up after reading from the cells list - */ -static void afs_proc_servers_stop(struct seq_file *p, void *v) +static void afs_proc_servers_stop(struct seq_file *m, void *v) __releases(rcu) { rcu_read_unlock(); } +static const struct seq_operations afs_proc_servers_ops = { + .start = afs_proc_servers_start, + .next = afs_proc_servers_next, + .stop = afs_proc_servers_stop, + .show = afs_proc_servers_show, +}; + /* - * display a header line followed by a load of volume lines + * Display the list of strings that may be substituted for the @sys pathname + * macro. */ -static int afs_proc_servers_show(struct seq_file *m, void *v) +static int afs_proc_sysname_show(struct seq_file *m, void *v) { - struct afs_server *server; - struct afs_addr_list *alist; - - if (v == SEQ_START_TOKEN) { - seq_puts(m, "UUID USE ADDR\n"); - return 0; - } + struct afs_net *net = afs_seq2net(m); + struct afs_sysnames *sysnames = net->sysnames; + unsigned int i = (unsigned long)v - 1; - server = list_entry(v, struct afs_server, proc_link); - alist = rcu_dereference(server->addresses); - seq_printf(m, "%pU %3d %pISp\n", - &server->uuid, - atomic_read(&server->usage), - &alist->addrs[alist->index].transport); + if (i < sysnames->nr) + seq_printf(m, "%s\n", sysnames->subs[i]); return 0; } -void afs_put_sysnames(struct afs_sysnames *sysnames) +static void *afs_proc_sysname_start(struct seq_file *m, loff_t *pos) + __acquires(&net->sysnames_lock) { - int i; + struct afs_net *net = afs_seq2net(m); + struct afs_sysnames *names; - if (sysnames && refcount_dec_and_test(&sysnames->usage)) { - for (i = 0; i < sysnames->nr; i++) - if (sysnames->subs[i] != afs_init_sysname && - sysnames->subs[i] != sysnames->blank) - kfree(sysnames->subs[i]); - } + read_lock(&net->sysnames_lock); + + names = net->sysnames; + if (*pos >= names->nr) + return NULL; + return (void *)(unsigned long)(*pos + 1); } -/* - * Handle opening of /proc/fs/afs/sysname. If it is opened for writing, we - * assume the caller wants to change the substitution list and we allocate a - * buffer to hold the list. - */ -static int afs_proc_sysname_open(struct inode *inode, struct file *file) +static void *afs_proc_sysname_next(struct seq_file *m, void *v, loff_t *pos) { - struct afs_sysnames *sysnames; - struct seq_file *m; - int ret; - - ret = seq_open(file, &afs_proc_sysname_ops); - if (ret < 0) - return ret; + struct afs_net *net = afs_seq2net(m); + struct afs_sysnames *names = net->sysnames; - if (file->f_mode & FMODE_WRITE) { - sysnames = kzalloc(sizeof(*sysnames), GFP_KERNEL); - if (!sysnames) { - seq_release(inode, file); - return -ENOMEM; - } + *pos += 1; + if (*pos >= names->nr) + return NULL; + return (void *)(unsigned long)(*pos + 1); +} - refcount_set(&sysnames->usage, 1); - m = file->private_data; - m->private = sysnames; - } +static void afs_proc_sysname_stop(struct seq_file *m, void *v) + __releases(&net->sysnames_lock) +{ + struct afs_net *net = afs_seq2net(m); - return 0; + read_unlock(&net->sysnames_lock); } +static const struct seq_operations afs_proc_sysname_ops = { + .start = afs_proc_sysname_start, + .next = afs_proc_sysname_next, + .stop = afs_proc_sysname_stop, + .show = afs_proc_sysname_show, +}; + /* - * Handle writes to /proc/fs/afs/sysname to set the @sys substitution. + * Allow the @sys substitution to be configured. */ -static ssize_t afs_proc_sysname_write(struct file *file, - const char __user *buf, - size_t size, loff_t *_pos) +static int afs_proc_sysname_write(struct file *file, char *buf, size_t size) { - struct afs_sysnames *sysnames; + struct afs_sysnames *sysnames, *kill; struct seq_file *m = file->private_data; - char *kbuf = NULL, *s, *p, *sub; + struct afs_net *net = afs_seq2net(m); + char *s, *p, *sub; int ret, len; - sysnames = m->private; + sysnames = kzalloc(sizeof(*sysnames), GFP_KERNEL); if (!sysnames) - return -EINVAL; - if (sysnames->error) - return sysnames->error; - - if (size >= PAGE_SIZE - 1) { - sysnames->error = -EINVAL; - return -EINVAL; - } - if (size == 0) - return 0; - - kbuf = memdup_user_nul(buf, size); - if (IS_ERR(kbuf)) - return PTR_ERR(kbuf); - - inode_lock(file_inode(file)); + return -ENOMEM; + refcount_set(&sysnames->usage, 1); + kill = sysnames; - p = kbuf; + p = buf; while ((s = strsep(&p, " \t\n"))) { len = strlen(s); if (len == 0) @@ -731,85 +485,36 @@ static ssize_t afs_proc_sysname_write(struct file *file, sysnames->nr++; } - ret = size; /* consume everything, always */ + if (sysnames->nr == 0) { + sysnames->subs[0] = sysnames->blank; + sysnames->nr++; + } + + write_lock(&net->sysnames_lock); + kill = net->sysnames; + net->sysnames = sysnames; + write_unlock(&net->sysnames_lock); + ret = 0; out: - inode_unlock(file_inode(file)); - kfree(kbuf); + afs_put_sysnames(kill); return ret; invalid: ret = -EINVAL; error: - sysnames->error = ret; goto out; } -static int afs_proc_sysname_release(struct inode *inode, struct file *file) +void afs_put_sysnames(struct afs_sysnames *sysnames) { - struct afs_sysnames *sysnames, *kill = NULL; - struct seq_file *m = file->private_data; - struct afs_net *net = afs_seq2net(m); + int i; - sysnames = m->private; - if (sysnames) { - if (!sysnames->error) { - kill = sysnames; - if (sysnames->nr == 0) { - sysnames->subs[0] = sysnames->blank; - sysnames->nr++; - } - write_lock(&net->sysnames_lock); - kill = net->sysnames; - net->sysnames = sysnames; - write_unlock(&net->sysnames_lock); - } - afs_put_sysnames(kill); + if (sysnames && refcount_dec_and_test(&sysnames->usage)) { + for (i = 0; i < sysnames->nr; i++) + if (sysnames->subs[i] != afs_init_sysname && + sysnames->subs[i] != sysnames->blank) + kfree(sysnames->subs[i]); } - - return seq_release(inode, file); -} - -static void *afs_proc_sysname_start(struct seq_file *m, loff_t *pos) - __acquires(&net->sysnames_lock) -{ - struct afs_net *net = afs_seq2net(m); - struct afs_sysnames *names = net->sysnames; - - read_lock(&net->sysnames_lock); - - if (*pos >= names->nr) - return NULL; - return (void *)(unsigned long)(*pos + 1); -} - -static void *afs_proc_sysname_next(struct seq_file *m, void *v, loff_t *pos) -{ - struct afs_net *net = afs_seq2net(m); - struct afs_sysnames *names = net->sysnames; - - *pos += 1; - if (*pos >= names->nr) - return NULL; - return (void *)(unsigned long)(*pos + 1); -} - -static void afs_proc_sysname_stop(struct seq_file *m, void *v) - __releases(&net->sysnames_lock) -{ - struct afs_net *net = afs_seq2net(m); - - read_unlock(&net->sysnames_lock); -} - -static int afs_proc_sysname_show(struct seq_file *m, void *v) -{ - struct afs_net *net = afs_seq2net(m); - struct afs_sysnames *sysnames = net->sysnames; - unsigned int i = (unsigned long)v - 1; - - if (i < sysnames->nr) - seq_printf(m, "%s\n", sysnames->subs[i]); - return 0; } /* @@ -817,7 +522,7 @@ static int afs_proc_sysname_show(struct seq_file *m, void *v) */ static int afs_proc_stats_show(struct seq_file *m, void *v) { - struct afs_net *net = afs_seq2net(m); + struct afs_net *net = afs_seq2net_single(m); seq_puts(m, "kAFS statistics\n"); @@ -842,3 +547,101 @@ static int afs_proc_stats_show(struct seq_file *m, void *v) atomic_long_read(&net->n_store_bytes)); return 0; } + +/* + * initialise /proc/fs/afs/<cell>/ + */ +int afs_proc_cell_setup(struct afs_cell *cell) +{ + struct proc_dir_entry *dir; + struct afs_net *net = cell->net; + + _enter("%p{%s},%p", cell, cell->name, net->proc_afs); + + dir = proc_net_mkdir(net->net, cell->name, net->proc_afs); + if (!dir) + goto error_dir; + + if (!proc_create_net_data("vlservers", 0444, dir, + &afs_proc_cell_vlservers_ops, + sizeof(struct seq_net_private), + cell) || + !proc_create_net_data("volumes", 0444, dir, + &afs_proc_cell_volumes_ops, + sizeof(struct seq_net_private), + cell)) + goto error_tree; + + _leave(" = 0"); + return 0; + +error_tree: + remove_proc_subtree(cell->name, net->proc_afs); +error_dir: + _leave(" = -ENOMEM"); + return -ENOMEM; +} + +/* + * remove /proc/fs/afs/<cell>/ + */ +void afs_proc_cell_remove(struct afs_cell *cell) +{ + struct afs_net *net = cell->net; + + _enter(""); + remove_proc_subtree(cell->name, net->proc_afs); + _leave(""); +} + +/* + * initialise the /proc/fs/afs/ directory + */ +int afs_proc_init(struct afs_net *net) +{ + struct proc_dir_entry *p; + + _enter(""); + + p = proc_net_mkdir(net->net, "afs", net->net->proc_net); + if (!p) + goto error_dir; + + if (!proc_create_net_data_write("cells", 0644, p, + &afs_proc_cells_ops, + afs_proc_cells_write, + sizeof(struct seq_net_private), + NULL) || + !proc_create_net_single_write("rootcell", 0644, p, + afs_proc_rootcell_show, + afs_proc_rootcell_write, + NULL) || + !proc_create_net("servers", 0444, p, &afs_proc_servers_ops, + sizeof(struct seq_net_private)) || + !proc_create_net_single("stats", 0444, p, afs_proc_stats_show, NULL) || + !proc_create_net_data_write("sysname", 0644, p, + &afs_proc_sysname_ops, + afs_proc_sysname_write, + sizeof(struct seq_net_private), + NULL)) + goto error_tree; + + net->proc_afs = p; + _leave(" = 0"); + return 0; + +error_tree: + proc_remove(p); +error_dir: + _leave(" = -ENOMEM"); + return -ENOMEM; +} + +/* + * clean up the /proc/fs/afs/ directory + */ +void afs_proc_cleanup(struct afs_net *net) +{ + proc_remove(net->proc_afs); + net->proc_afs = NULL; +} diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 08735948f15d..a1b18082991b 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -46,7 +46,7 @@ int afs_open_socket(struct afs_net *net) _enter(""); - ret = sock_create_kern(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET6, &socket); + ret = sock_create_kern(net->net, AF_RXRPC, SOCK_DGRAM, PF_INET6, &socket); if (ret < 0) goto error_1; diff --git a/fs/afs/server.c b/fs/afs/server.c index 3af4625e2f8c..1d329e6981d5 100644 --- a/fs/afs/server.c +++ b/fs/afs/server.c @@ -228,7 +228,7 @@ static struct afs_server *afs_alloc_server(struct afs_net *net, server->flags = (1UL << AFS_SERVER_FL_NEW); server->update_at = ktime_get_real_seconds() + afs_server_update_delay; rwlock_init(&server->fs_lock); - INIT_LIST_HEAD(&server->cb_interests); + INIT_HLIST_HEAD(&server->cb_volumes); rwlock_init(&server->cb_break_lock); afs_inc_servers_outstanding(net); diff --git a/fs/afs/super.c b/fs/afs/super.c index 9e5d7966621c..4d3e274207fb 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -48,6 +48,8 @@ struct file_system_type afs_fs_type = { }; MODULE_ALIAS_FS("afs"); +int afs_net_id; + static const struct super_operations afs_super_ops = { .statfs = afs_statfs, .alloc_inode = afs_alloc_inode, @@ -117,7 +119,7 @@ int __init afs_fs_init(void) /* * clean up the filesystem */ -void __exit afs_fs_exit(void) +void afs_fs_exit(void) { _enter(""); @@ -351,14 +353,19 @@ static int afs_test_super(struct super_block *sb, void *data) struct afs_super_info *as1 = data; struct afs_super_info *as = AFS_FS_S(sb); - return (as->net == as1->net && + return (as->net_ns == as1->net_ns && as->volume && - as->volume->vid == as1->volume->vid); + as->volume->vid == as1->volume->vid && + !as->dyn_root); } static int afs_dynroot_test_super(struct super_block *sb, void *data) { - return false; + struct afs_super_info *as1 = data; + struct afs_super_info *as = AFS_FS_S(sb); + + return (as->net_ns == as1->net_ns && + as->dyn_root); } static int afs_set_super(struct super_block *sb, void *data) @@ -418,10 +425,14 @@ static int afs_fill_super(struct super_block *sb, if (!sb->s_root) goto error; - if (params->dyn_root) + if (as->dyn_root) { sb->s_d_op = &afs_dynroot_dentry_operations; - else + ret = afs_dynroot_populate(sb); + if (ret < 0) + goto error; + } else { sb->s_d_op = &afs_fs_dentry_operations; + } _leave(" = 0"); return 0; @@ -437,7 +448,7 @@ static struct afs_super_info *afs_alloc_sbi(struct afs_mount_params *params) as = kzalloc(sizeof(struct afs_super_info), GFP_KERNEL); if (as) { - as->net = afs_get_net(params->net); + as->net_ns = get_net(params->net_ns); if (params->dyn_root) as->dyn_root = true; else @@ -450,12 +461,31 @@ static void afs_destroy_sbi(struct afs_super_info *as) { if (as) { afs_put_volume(as->cell, as->volume); - afs_put_cell(as->net, as->cell); - afs_put_net(as->net); + afs_put_cell(afs_net(as->net_ns), as->cell); + put_net(as->net_ns); kfree(as); } } +static void afs_kill_super(struct super_block *sb) +{ + struct afs_super_info *as = AFS_FS_S(sb); + struct afs_net *net = afs_net(as->net_ns); + + if (as->dyn_root) + afs_dynroot_depopulate(sb); + + /* Clear the callback interests (which will do ilookup5) before + * deactivating the superblock. + */ + if (as->volume) + afs_clear_callback_interests(net, as->volume->servers); + kill_anon_super(sb); + if (as->volume) + afs_deactivate_volume(as->volume); + afs_destroy_sbi(as); +} + /* * get an AFS superblock */ @@ -472,12 +502,13 @@ static struct dentry *afs_mount(struct file_system_type *fs_type, _enter(",,%s,%p", dev_name, options); memset(¶ms, 0, sizeof(params)); - params.net = &__afs_net; ret = -EINVAL; if (current->nsproxy->net_ns != &init_net) goto error; - + params.net_ns = current->nsproxy->net_ns; + params.net = afs_net(params.net_ns); + /* parse the options and device name */ if (options) { ret = afs_parse_options(¶ms, options, &dev_name); @@ -563,21 +594,6 @@ error: return ERR_PTR(ret); } -static void afs_kill_super(struct super_block *sb) -{ - struct afs_super_info *as = AFS_FS_S(sb); - - /* Clear the callback interests (which will do ilookup5) before - * deactivating the superblock. - */ - if (as->volume) - afs_clear_callback_interests(as->net, as->volume->servers); - kill_anon_super(sb); - if (as->volume) - afs_deactivate_volume(as->volume); - afs_destroy_sbi(as); -} - /* * Initialise an inode cache slab element prior to any use. Note that * afs_alloc_inode() *must* reset anything that could incorrectly leak from one @@ -1661,7 +1661,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, if (mask && !(mask & req->events)) return 0; - mask = file->f_op->poll_mask(file, req->events); + mask = file->f_op->poll_mask(file, req->events) & req->events; if (!mask) return 0; @@ -1719,7 +1719,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) spin_lock_irq(&ctx->ctx_lock); spin_lock(&req->head->lock); - mask = req->file->f_op->poll_mask(req->file, req->events); + mask = req->file->f_op->poll_mask(req->file, req->events) & req->events; if (!mask) { __add_wait_queue(req->head, &req->wait); list_add_tail(&aiocb->ki_list, &ctx->active_reqs); diff --git a/fs/eventfd.c b/fs/eventfd.c index 61c9514da5e9..ceb1031f1cac 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -156,11 +156,11 @@ static __poll_t eventfd_poll_mask(struct file *file, __poll_t eventmask) count = READ_ONCE(ctx->count); if (count > 0) - events |= EPOLLIN; + events |= (EPOLLIN & eventmask); if (count == ULLONG_MAX) events |= EPOLLERR; if (ULLONG_MAX - 1 > count) - events |= EPOLLOUT; + events |= (EPOLLOUT & eventmask); return events; } diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 67db22fe99c5..ea4436f409fb 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -922,13 +922,17 @@ static __poll_t ep_read_events_proc(struct eventpoll *ep, struct list_head *head return 0; } -static __poll_t ep_eventpoll_poll(struct file *file, poll_table *wait) +static struct wait_queue_head *ep_eventpoll_get_poll_head(struct file *file, + __poll_t eventmask) { struct eventpoll *ep = file->private_data; - int depth = 0; + return &ep->poll_wait; +} - /* Insert inside our poll wait queue */ - poll_wait(file, &ep->poll_wait, wait); +static __poll_t ep_eventpoll_poll_mask(struct file *file, __poll_t eventmask) +{ + struct eventpoll *ep = file->private_data; + int depth = 0; /* * Proceed to find out if wanted events are really available inside @@ -968,7 +972,8 @@ static const struct file_operations eventpoll_fops = { .show_fdinfo = ep_show_fdinfo, #endif .release = ep_eventpoll_release, - .poll = ep_eventpoll_poll, + .get_poll_head = ep_eventpoll_get_poll_head, + .poll_mask = ep_eventpoll_poll_mask, .llseek = noop_llseek, }; diff --git a/fs/namei.c b/fs/namei.c index 2490ddb8bc90..734cef54fdf8 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2464,6 +2464,35 @@ static int lookup_one_len_common(const char *name, struct dentry *base, } /** + * try_lookup_one_len - filesystem helper to lookup single pathname component + * @name: pathname component to lookup + * @base: base directory to lookup from + * @len: maximum length @len should be interpreted to + * + * Look up a dentry by name in the dcache, returning NULL if it does not + * currently exist. The function does not try to create a dentry. + * + * Note that this routine is purely a helper for filesystem usage and should + * not be called by generic code. + * + * The caller must hold base->i_mutex. + */ +struct dentry *try_lookup_one_len(const char *name, struct dentry *base, int len) +{ + struct qstr this; + int err; + + WARN_ON_ONCE(!inode_is_locked(base->d_inode)); + + err = lookup_one_len_common(name, base, len, &this); + if (err) + return ERR_PTR(err); + + return lookup_dcache(&this, base, 0); +} +EXPORT_SYMBOL(try_lookup_one_len); + +/** * lookup_one_len - filesystem helper to lookup single pathname component * @name: pathname component to lookup * @base: base directory to lookup from diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index 63a1ca4b9dee..e2bea2ac5dfb 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -79,12 +79,11 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark) */ static int dnotify_handle_event(struct fsnotify_group *group, struct inode *inode, - struct fsnotify_mark *inode_mark, - struct fsnotify_mark *vfsmount_mark, u32 mask, const void *data, int data_type, const unsigned char *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) { + struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); struct dnotify_mark *dn_mark; struct dnotify_struct *dn; struct dnotify_struct **prev; @@ -95,7 +94,8 @@ static int dnotify_handle_event(struct fsnotify_group *group, if (!S_ISDIR(inode->i_mode)) return 0; - BUG_ON(vfsmount_mark); + if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) + return 0; dn_mark = container_of(inode_mark, struct dnotify_mark, fsn_mark); @@ -319,7 +319,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) dn_mark = container_of(fsn_mark, struct dnotify_mark, fsn_mark); spin_lock(&fsn_mark->lock); } else { - error = fsnotify_add_mark_locked(new_fsn_mark, inode, NULL, 0); + error = fsnotify_add_inode_mark_locked(new_fsn_mark, inode, 0); if (error) { mutex_unlock(&dnotify_group->mark_mutex); goto out_err; diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index d94e8031fe5f..f90842efea13 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -87,17 +87,17 @@ static int fanotify_get_response(struct fsnotify_group *group, return ret; } -static bool fanotify_should_send_event(struct fsnotify_mark *inode_mark, - struct fsnotify_mark *vfsmnt_mark, - u32 event_mask, - const void *data, int data_type) +static bool fanotify_should_send_event(struct fsnotify_iter_info *iter_info, + u32 event_mask, const void *data, + int data_type) { __u32 marks_mask = 0, marks_ignored_mask = 0; const struct path *path = data; + struct fsnotify_mark *mark; + int type; - pr_debug("%s: inode_mark=%p vfsmnt_mark=%p mask=%x data=%p" - " data_type=%d\n", __func__, inode_mark, vfsmnt_mark, - event_mask, data, data_type); + pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n", + __func__, iter_info->report_mask, event_mask, data, data_type); /* if we don't have enough info to send an event to userspace say no */ if (data_type != FSNOTIFY_EVENT_PATH) @@ -108,20 +108,21 @@ static bool fanotify_should_send_event(struct fsnotify_mark *inode_mark, !d_can_lookup(path->dentry)) return false; - /* - * if the event is for a child and this inode doesn't care about - * events on the child, don't send it! - */ - if (inode_mark && - (!(event_mask & FS_EVENT_ON_CHILD) || - (inode_mark->mask & FS_EVENT_ON_CHILD))) { - marks_mask |= inode_mark->mask; - marks_ignored_mask |= inode_mark->ignored_mask; - } + fsnotify_foreach_obj_type(type) { + if (!fsnotify_iter_should_report_type(iter_info, type)) + continue; + mark = iter_info->marks[type]; + /* + * if the event is for a child and this inode doesn't care about + * events on the child, don't send it! + */ + if (type == FSNOTIFY_OBJ_TYPE_INODE && + (event_mask & FS_EVENT_ON_CHILD) && + !(mark->mask & FS_EVENT_ON_CHILD)) + continue; - if (vfsmnt_mark) { - marks_mask |= vfsmnt_mark->mask; - marks_ignored_mask |= vfsmnt_mark->ignored_mask; + marks_mask |= mark->mask; + marks_ignored_mask |= mark->ignored_mask; } if (d_is_dir(path->dentry) && @@ -178,8 +179,6 @@ init: __maybe_unused static int fanotify_handle_event(struct fsnotify_group *group, struct inode *inode, - struct fsnotify_mark *inode_mark, - struct fsnotify_mark *fanotify_mark, u32 mask, const void *data, int data_type, const unsigned char *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) @@ -199,8 +198,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, BUILD_BUG_ON(FAN_ACCESS_PERM != FS_ACCESS_PERM); BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR); - if (!fanotify_should_send_event(inode_mark, fanotify_mark, mask, data, - data_type)) + if (!fanotify_should_send_event(iter_info, mask, data, data_type)) return 0; pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode, diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index d478629c728b..10aac1942c9f 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -77,7 +77,7 @@ static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) struct inotify_inode_mark *inode_mark; struct inode *inode; - if (!(mark->connector->flags & FSNOTIFY_OBJ_TYPE_INODE)) + if (mark->connector->type != FSNOTIFY_OBJ_TYPE_INODE) return; inode_mark = container_of(mark, struct inotify_inode_mark, fsn_mark); @@ -116,7 +116,7 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) mflags |= FAN_MARK_IGNORED_SURV_MODIFY; - if (mark->connector->flags & FSNOTIFY_OBJ_TYPE_INODE) { + if (mark->connector->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = igrab(mark->connector->inode); if (!inode) return; @@ -126,7 +126,7 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) show_mark_fhandle(m, inode); seq_putc(m, '\n'); iput(inode); - } else if (mark->connector->flags & FSNOTIFY_OBJ_TYPE_VFSMOUNT) { + } else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { struct mount *mnt = real_mount(mark->connector->mnt); seq_printf(m, "fanotify mnt_id:%x mflags:%x mask:%x ignored_mask:%x\n", diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 613ec7e5a465..f174397b63a0 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -184,8 +184,6 @@ int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u32 mask EXPORT_SYMBOL_GPL(__fsnotify_parent); static int send_to_group(struct inode *to_tell, - struct fsnotify_mark *inode_mark, - struct fsnotify_mark *vfsmount_mark, __u32 mask, const void *data, int data_is, u32 cookie, const unsigned char *file_name, @@ -195,48 +193,45 @@ static int send_to_group(struct inode *to_tell, __u32 test_mask = (mask & ~FS_EVENT_ON_CHILD); __u32 marks_mask = 0; __u32 marks_ignored_mask = 0; + struct fsnotify_mark *mark; + int type; - if (unlikely(!inode_mark && !vfsmount_mark)) { - BUG(); + if (WARN_ON(!iter_info->report_mask)) return 0; - } /* clear ignored on inode modification */ if (mask & FS_MODIFY) { - if (inode_mark && - !(inode_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) - inode_mark->ignored_mask = 0; - if (vfsmount_mark && - !(vfsmount_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) - vfsmount_mark->ignored_mask = 0; - } - - /* does the inode mark tell us to do something? */ - if (inode_mark) { - group = inode_mark->group; - marks_mask |= inode_mark->mask; - marks_ignored_mask |= inode_mark->ignored_mask; + fsnotify_foreach_obj_type(type) { + if (!fsnotify_iter_should_report_type(iter_info, type)) + continue; + mark = iter_info->marks[type]; + if (mark && + !(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) + mark->ignored_mask = 0; + } } - /* does the vfsmount_mark tell us to do something? */ - if (vfsmount_mark) { - group = vfsmount_mark->group; - marks_mask |= vfsmount_mark->mask; - marks_ignored_mask |= vfsmount_mark->ignored_mask; + fsnotify_foreach_obj_type(type) { + if (!fsnotify_iter_should_report_type(iter_info, type)) + continue; + mark = iter_info->marks[type]; + /* does the object mark tell us to do something? */ + if (mark) { + group = mark->group; + marks_mask |= mark->mask; + marks_ignored_mask |= mark->ignored_mask; + } } - pr_debug("%s: group=%p to_tell=%p mask=%x inode_mark=%p" - " vfsmount_mark=%p marks_mask=%x marks_ignored_mask=%x" + pr_debug("%s: group=%p to_tell=%p mask=%x marks_mask=%x marks_ignored_mask=%x" " data=%p data_is=%d cookie=%d\n", - __func__, group, to_tell, mask, inode_mark, vfsmount_mark, - marks_mask, marks_ignored_mask, data, - data_is, cookie); + __func__, group, to_tell, mask, marks_mask, marks_ignored_mask, + data, data_is, cookie); if (!(test_mask & marks_mask & ~marks_ignored_mask)) return 0; - return group->ops->handle_event(group, to_tell, inode_mark, - vfsmount_mark, mask, data, data_is, + return group->ops->handle_event(group, to_tell, mask, data, data_is, file_name, cookie, iter_info); } @@ -264,6 +259,57 @@ static struct fsnotify_mark *fsnotify_next_mark(struct fsnotify_mark *mark) } /* + * iter_info is a multi head priority queue of marks. + * Pick a subset of marks from queue heads, all with the + * same group and set the report_mask for selected subset. + * Returns the report_mask of the selected subset. + */ +static unsigned int fsnotify_iter_select_report_types( + struct fsnotify_iter_info *iter_info) +{ + struct fsnotify_group *max_prio_group = NULL; + struct fsnotify_mark *mark; + int type; + + /* Choose max prio group among groups of all queue heads */ + fsnotify_foreach_obj_type(type) { + mark = iter_info->marks[type]; + if (mark && + fsnotify_compare_groups(max_prio_group, mark->group) > 0) + max_prio_group = mark->group; + } + + if (!max_prio_group) + return 0; + + /* Set the report mask for marks from same group as max prio group */ + iter_info->report_mask = 0; + fsnotify_foreach_obj_type(type) { + mark = iter_info->marks[type]; + if (mark && + fsnotify_compare_groups(max_prio_group, mark->group) == 0) + fsnotify_iter_set_report_type(iter_info, type); + } + + return iter_info->report_mask; +} + +/* + * Pop from iter_info multi head queue, the marks that were iterated in the + * current iteration step. + */ +static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) +{ + int type; + + fsnotify_foreach_obj_type(type) { + if (fsnotify_iter_should_report_type(iter_info, type)) + iter_info->marks[type] = + fsnotify_next_mark(iter_info->marks[type]); + } +} + +/* * This is the main call to fsnotify. The VFS calls into hook specific functions * in linux/fsnotify.h. Those functions then in turn call here. Here will call * out to all of the registered fsnotify_group. Those groups can then use the @@ -307,15 +353,15 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, if ((mask & FS_MODIFY) || (test_mask & to_tell->i_fsnotify_mask)) { - iter_info.inode_mark = + iter_info.marks[FSNOTIFY_OBJ_TYPE_INODE] = fsnotify_first_mark(&to_tell->i_fsnotify_marks); } if (mnt && ((mask & FS_MODIFY) || (test_mask & mnt->mnt_fsnotify_mask))) { - iter_info.inode_mark = + iter_info.marks[FSNOTIFY_OBJ_TYPE_INODE] = fsnotify_first_mark(&to_tell->i_fsnotify_marks); - iter_info.vfsmount_mark = + iter_info.marks[FSNOTIFY_OBJ_TYPE_VFSMOUNT] = fsnotify_first_mark(&mnt->mnt_fsnotify_marks); } @@ -324,32 +370,14 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, * ignore masks are properly reflected for mount mark notifications. * That's why this traversal is so complicated... */ - while (iter_info.inode_mark || iter_info.vfsmount_mark) { - struct fsnotify_mark *inode_mark = iter_info.inode_mark; - struct fsnotify_mark *vfsmount_mark = iter_info.vfsmount_mark; - - if (inode_mark && vfsmount_mark) { - int cmp = fsnotify_compare_groups(inode_mark->group, - vfsmount_mark->group); - if (cmp > 0) - inode_mark = NULL; - else if (cmp < 0) - vfsmount_mark = NULL; - } - - ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask, - data, data_is, cookie, file_name, - &iter_info); + while (fsnotify_iter_select_report_types(&iter_info)) { + ret = send_to_group(to_tell, mask, data, data_is, cookie, + file_name, &iter_info); if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS)) goto out; - if (inode_mark) - iter_info.inode_mark = - fsnotify_next_mark(iter_info.inode_mark); - if (vfsmount_mark) - iter_info.vfsmount_mark = - fsnotify_next_mark(iter_info.vfsmount_mark); + fsnotify_iter_next(&iter_info); } ret = 0; out: diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h index 60f365dc1408..34515d2c4ba3 100644 --- a/fs/notify/fsnotify.h +++ b/fs/notify/fsnotify.h @@ -9,12 +9,6 @@ #include "../mount.h" -struct fsnotify_iter_info { - struct fsnotify_mark *inode_mark; - struct fsnotify_mark *vfsmount_mark; - int srcu_idx; -}; - /* destroy all events sitting in this groups notification queue */ extern void fsnotify_flush_notify(struct fsnotify_group *group); diff --git a/fs/notify/group.c b/fs/notify/group.c index b7a4b6a69efa..aa5468f23e45 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -67,7 +67,7 @@ void fsnotify_destroy_group(struct fsnotify_group *group) fsnotify_group_stop_queueing(group); /* Clear all marks for this group and queue them for destruction */ - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_ALL_TYPES); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_ALL_TYPES_MASK); /* * Some marks can still be pinned when waiting for response from diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h index c00d2caca894..7e4578d35b61 100644 --- a/fs/notify/inotify/inotify.h +++ b/fs/notify/inotify/inotify.h @@ -25,8 +25,6 @@ extern void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark, struct fsnotify_group *group); extern int inotify_handle_event(struct fsnotify_group *group, struct inode *inode, - struct fsnotify_mark *inode_mark, - struct fsnotify_mark *vfsmount_mark, u32 mask, const void *data, int data_type, const unsigned char *file_name, u32 cookie, struct fsnotify_iter_info *iter_info); diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 40dedb37a1f3..9ab6dde38a14 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -65,12 +65,11 @@ static int inotify_merge(struct list_head *list, int inotify_handle_event(struct fsnotify_group *group, struct inode *inode, - struct fsnotify_mark *inode_mark, - struct fsnotify_mark *vfsmount_mark, u32 mask, const void *data, int data_type, const unsigned char *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) { + struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); struct inotify_inode_mark *i_mark; struct inotify_event_info *event; struct fsnotify_event *fsn_event; @@ -78,7 +77,8 @@ int inotify_handle_event(struct fsnotify_group *group, int len = 0; int alloc_len = sizeof(struct inotify_event_info); - BUG_ON(vfsmount_mark); + if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) + return 0; if ((inode_mark->mask & FS_EXCL_UNLINK) && (data_type == FSNOTIFY_EVENT_PATH)) { diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index ef32f3657958..1cf5b779d862 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -485,10 +485,14 @@ void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark, struct fsnotify_group *group) { struct inotify_inode_mark *i_mark; + struct fsnotify_iter_info iter_info = { }; + + fsnotify_iter_set_report_type_mark(&iter_info, FSNOTIFY_OBJ_TYPE_INODE, + fsn_mark); /* Queue ignore event for the watch */ - inotify_handle_event(group, NULL, fsn_mark, NULL, FS_IN_IGNORED, - NULL, FSNOTIFY_EVENT_NONE, NULL, 0, NULL); + inotify_handle_event(group, NULL, FS_IN_IGNORED, NULL, + FSNOTIFY_EVENT_NONE, NULL, 0, &iter_info); i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark); /* remove this mark from the idr */ @@ -578,7 +582,7 @@ static int inotify_new_watch(struct fsnotify_group *group, } /* we are on the idr, now get on the inode */ - ret = fsnotify_add_mark_locked(&tmp_i_mark->fsn_mark, inode, NULL, 0); + ret = fsnotify_add_inode_mark_locked(&tmp_i_mark->fsn_mark, inode, 0); if (ret) { /* we failed to get on the inode, get off the idr */ inotify_remove_from_idr(group, tmp_i_mark); diff --git a/fs/notify/mark.c b/fs/notify/mark.c index e9191b416434..61f4c5fa34c7 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -119,9 +119,9 @@ static void __fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) if (mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) new_mask |= mark->mask; } - if (conn->flags & FSNOTIFY_OBJ_TYPE_INODE) + if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) conn->inode->i_fsnotify_mask = new_mask; - else if (conn->flags & FSNOTIFY_OBJ_TYPE_VFSMOUNT) + else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) real_mount(conn->mnt)->mnt_fsnotify_mask = new_mask; } @@ -139,7 +139,7 @@ void fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) spin_lock(&conn->lock); __fsnotify_recalc_mask(conn); spin_unlock(&conn->lock); - if (conn->flags & FSNOTIFY_OBJ_TYPE_INODE) + if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) __fsnotify_update_child_dentry_flags(conn->inode); } @@ -166,18 +166,18 @@ static struct inode *fsnotify_detach_connector_from_object( { struct inode *inode = NULL; - if (conn->flags & FSNOTIFY_OBJ_TYPE_INODE) { + if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = conn->inode; rcu_assign_pointer(inode->i_fsnotify_marks, NULL); inode->i_fsnotify_mask = 0; conn->inode = NULL; - conn->flags &= ~FSNOTIFY_OBJ_TYPE_INODE; - } else if (conn->flags & FSNOTIFY_OBJ_TYPE_VFSMOUNT) { + conn->type = FSNOTIFY_OBJ_TYPE_DETACHED; + } else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { rcu_assign_pointer(real_mount(conn->mnt)->mnt_fsnotify_marks, NULL); real_mount(conn->mnt)->mnt_fsnotify_mask = 0; conn->mnt = NULL; - conn->flags &= ~FSNOTIFY_OBJ_TYPE_VFSMOUNT; + conn->type = FSNOTIFY_OBJ_TYPE_DETACHED; } return inode; @@ -294,12 +294,12 @@ static void fsnotify_put_mark_wake(struct fsnotify_mark *mark) bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info) { - /* This can fail if mark is being removed */ - if (!fsnotify_get_mark_safe(iter_info->inode_mark)) - return false; - if (!fsnotify_get_mark_safe(iter_info->vfsmount_mark)) { - fsnotify_put_mark_wake(iter_info->inode_mark); - return false; + int type; + + fsnotify_foreach_obj_type(type) { + /* This can fail if mark is being removed */ + if (!fsnotify_get_mark_safe(iter_info->marks[type])) + goto fail; } /* @@ -310,13 +310,20 @@ bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info) srcu_read_unlock(&fsnotify_mark_srcu, iter_info->srcu_idx); return true; + +fail: + for (type--; type >= 0; type--) + fsnotify_put_mark_wake(iter_info->marks[type]); + return false; } void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info) { + int type; + iter_info->srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); - fsnotify_put_mark_wake(iter_info->inode_mark); - fsnotify_put_mark_wake(iter_info->vfsmount_mark); + fsnotify_foreach_obj_type(type) + fsnotify_put_mark_wake(iter_info->marks[type]); } /* @@ -442,10 +449,10 @@ static int fsnotify_attach_connector_to_object( spin_lock_init(&conn->lock); INIT_HLIST_HEAD(&conn->list); if (inode) { - conn->flags = FSNOTIFY_OBJ_TYPE_INODE; + conn->type = FSNOTIFY_OBJ_TYPE_INODE; conn->inode = igrab(inode); } else { - conn->flags = FSNOTIFY_OBJ_TYPE_VFSMOUNT; + conn->type = FSNOTIFY_OBJ_TYPE_VFSMOUNT; conn->mnt = mnt; } /* @@ -479,8 +486,7 @@ static struct fsnotify_mark_connector *fsnotify_grab_connector( if (!conn) goto out; spin_lock(&conn->lock); - if (!(conn->flags & (FSNOTIFY_OBJ_TYPE_INODE | - FSNOTIFY_OBJ_TYPE_VFSMOUNT))) { + if (conn->type == FSNOTIFY_OBJ_TYPE_DETACHED) { spin_unlock(&conn->lock); srcu_read_unlock(&fsnotify_mark_srcu, idx); return NULL; @@ -646,16 +652,16 @@ struct fsnotify_mark *fsnotify_find_mark( return NULL; } -/* Clear any marks in a group with given type */ +/* Clear any marks in a group with given type mask */ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, - unsigned int type) + unsigned int type_mask) { struct fsnotify_mark *lmark, *mark; LIST_HEAD(to_free); struct list_head *head = &to_free; /* Skip selection step if we want to clear all marks. */ - if (type == FSNOTIFY_OBJ_ALL_TYPES) { + if (type_mask == FSNOTIFY_OBJ_ALL_TYPES_MASK) { head = &group->marks_list; goto clear; } @@ -670,7 +676,7 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, */ mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) { - if (mark->connector->flags & type) + if ((1U << mark->connector->type) & type_mask) list_move(&mark->g_list, &to_free); } mutex_unlock(&group->mark_mutex); diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c index 74b37cbbd5d4..33ee8cb32f83 100644 --- a/fs/orangefs/devorangefs-req.c +++ b/fs/orangefs/devorangefs-req.c @@ -719,37 +719,6 @@ struct ORANGEFS_dev_map_desc32 { __s32 count; }; -static unsigned long translate_dev_map26(unsigned long args, long *error) -{ - struct ORANGEFS_dev_map_desc32 __user *p32 = (void __user *)args; - /* - * Depending on the architecture, allocate some space on the - * user-call-stack based on our expected layout. - */ - struct ORANGEFS_dev_map_desc __user *p = - compat_alloc_user_space(sizeof(*p)); - compat_uptr_t addr; - - *error = 0; - /* get the ptr from the 32 bit user-space */ - if (get_user(addr, &p32->ptr)) - goto err; - /* try to put that into a 64-bit layout */ - if (put_user(compat_ptr(addr), &p->ptr)) - goto err; - /* copy the remaining fields */ - if (copy_in_user(&p->total_size, &p32->total_size, sizeof(__s32))) - goto err; - if (copy_in_user(&p->size, &p32->size, sizeof(__s32))) - goto err; - if (copy_in_user(&p->count, &p32->count, sizeof(__s32))) - goto err; - return (unsigned long)p; -err: - *error = -EFAULT; - return 0; -} - /* * 32 bit user-space apps' ioctl handlers when kernel modules * is compiled as a 64 bit one @@ -758,25 +727,26 @@ static long orangefs_devreq_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long args) { long ret; - unsigned long arg = args; /* Check for properly constructed commands */ ret = check_ioctl_command(cmd); if (ret < 0) return ret; if (cmd == ORANGEFS_DEV_MAP) { - /* - * convert the arguments to what we expect internally - * in kernel space - */ - arg = translate_dev_map26(args, &ret); - if (ret < 0) { - gossip_err("Could not translate dev map\n"); - return ret; - } + struct ORANGEFS_dev_map_desc desc; + struct ORANGEFS_dev_map_desc32 d32; + + if (copy_from_user(&d32, (void __user *)args, sizeof(d32))) + return -EFAULT; + + desc.ptr = compat_ptr(d32.ptr); + desc.total_size = d32.total_size; + desc.size = d32.size; + desc.count = d32.count; + return orangefs_bufmap_initialize(&desc); } /* no other ioctl requires translation */ - return dispatch_ioctl_command(cmd, arg); + return dispatch_ioctl_command(cmd, args); } #endif /* CONFIG_COMPAT is in .config */ diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 7b4d9714f248..6ac1c92997ea 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -409,7 +409,7 @@ static struct proc_dir_entry *__proc_create(struct proc_dir_entry **parent, if (!ent) goto out; - if (qstr.len + 1 <= sizeof(ent->inline_name)) { + if (qstr.len + 1 <= SIZEOF_PDE_INLINE_NAME) { ent->name = ent->inline_name; } else { ent->name = kmalloc(qstr.len + 1, GFP_KERNEL); @@ -740,3 +740,27 @@ void *PDE_DATA(const struct inode *inode) return __PDE_DATA(inode); } EXPORT_SYMBOL(PDE_DATA); + +/* + * Pull a user buffer into memory and pass it to the file's write handler if + * one is supplied. The ->write() method is permitted to modify the + * kernel-side buffer. + */ +ssize_t proc_simple_write(struct file *f, const char __user *ubuf, size_t size, + loff_t *_pos) +{ + struct proc_dir_entry *pde = PDE(file_inode(f)); + char *buf; + int ret; + + if (!pde->write) + return -EACCES; + if (size == 0 || size > PAGE_SIZE - 1) + return -EINVAL; + buf = memdup_user_nul(ubuf, size); + if (IS_ERR(buf)) + return PTR_ERR(buf); + ret = pde->write(f, buf, size); + kfree(buf); + return ret == 0 ? size : ret; +} diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 2cf3b74391ca..85ffbd27f288 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -105,9 +105,8 @@ void __init proc_init_kmemcache(void) kmem_cache_create("pde_opener", sizeof(struct pde_opener), 0, SLAB_ACCOUNT|SLAB_PANIC, NULL); proc_dir_entry_cache = kmem_cache_create_usercopy( - "proc_dir_entry", sizeof(struct proc_dir_entry), 0, SLAB_PANIC, - offsetof(struct proc_dir_entry, inline_name), - sizeof_field(struct proc_dir_entry, inline_name), NULL); + "proc_dir_entry", SIZEOF_PDE_SLOT, 0, SLAB_PANIC, + OFFSETOF_PDE_NAME, SIZEOF_PDE_INLINE_NAME, NULL); } static int proc_show_options(struct seq_file *seq, struct dentry *root) diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 50cb22a08c2f..da3dbfa09e79 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -48,6 +48,7 @@ struct proc_dir_entry { const struct seq_operations *seq_ops; int (*single_show)(struct seq_file *, void *); }; + proc_write_t write; void *data; unsigned int state_size; unsigned int low_ino; @@ -61,14 +62,20 @@ struct proc_dir_entry { char *name; umode_t mode; u8 namelen; -#ifdef CONFIG_64BIT -#define SIZEOF_PDE_INLINE_NAME (192-155) -#else -#define SIZEOF_PDE_INLINE_NAME (128-95) -#endif - char inline_name[SIZEOF_PDE_INLINE_NAME]; + char inline_name[]; } __randomize_layout; +#define OFFSETOF_PDE_NAME offsetof(struct proc_dir_entry, inline_name) +#define SIZEOF_PDE_SLOT \ + (OFFSETOF_PDE_NAME + 34 <= 64 ? 64 : \ + OFFSETOF_PDE_NAME + 34 <= 128 ? 128 : \ + OFFSETOF_PDE_NAME + 34 <= 192 ? 192 : \ + OFFSETOF_PDE_NAME + 34 <= 256 ? 256 : \ + OFFSETOF_PDE_NAME + 34 <= 512 ? 512 : \ + 0) + +#define SIZEOF_PDE_INLINE_NAME (SIZEOF_PDE_SLOT - OFFSETOF_PDE_NAME) + extern struct kmem_cache *proc_dir_entry_cache; void pde_free(struct proc_dir_entry *pde); @@ -189,6 +196,7 @@ static inline bool is_empty_pde(const struct proc_dir_entry *pde) { return S_ISDIR(pde->mode) && !pde->proc_iops; } +extern ssize_t proc_simple_write(struct file *, const char __user *, size_t, loff_t *); /* * inode.c diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index 7d94fa005b0d..d5e0fcb3439e 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -46,6 +46,9 @@ static int seq_open_net(struct inode *inode, struct file *file) WARN_ON_ONCE(state_size < sizeof(*p)); + if (file->f_mode & FMODE_WRITE && !PDE(inode)->write) + return -EACCES; + net = get_proc_net(inode); if (!net) return -ENXIO; @@ -73,6 +76,7 @@ static int seq_release_net(struct inode *ino, struct file *f) static const struct file_operations proc_net_seq_fops = { .open = seq_open_net, .read = seq_read, + .write = proc_simple_write, .llseek = seq_lseek, .release = seq_release_net, }; @@ -93,6 +97,50 @@ struct proc_dir_entry *proc_create_net_data(const char *name, umode_t mode, } EXPORT_SYMBOL_GPL(proc_create_net_data); +/** + * proc_create_net_data_write - Create a writable net_ns-specific proc file + * @name: The name of the file. + * @mode: The file's access mode. + * @parent: The parent directory in which to create. + * @ops: The seq_file ops with which to read the file. + * @write: The write method which which to 'modify' the file. + * @data: Data for retrieval by PDE_DATA(). + * + * Create a network namespaced proc file in the @parent directory with the + * specified @name and @mode that allows reading of a file that displays a + * series of elements and also provides for the file accepting writes that have + * some arbitrary effect. + * + * The functions in the @ops table are used to iterate over items to be + * presented and extract the readable content using the seq_file interface. + * + * The @write function is called with the data copied into a kernel space + * scratch buffer and has a NUL appended for convenience. The buffer may be + * modified by the @write function. @write should return 0 on success. + * + * The @data value is accessible from the @show and @write functions by calling + * PDE_DATA() on the file inode. The network namespace must be accessed by + * calling seq_file_net() on the seq_file struct. + */ +struct proc_dir_entry *proc_create_net_data_write(const char *name, umode_t mode, + struct proc_dir_entry *parent, + const struct seq_operations *ops, + proc_write_t write, + unsigned int state_size, void *data) +{ + struct proc_dir_entry *p; + + p = proc_create_reg(name, mode, &parent, data); + if (!p) + return NULL; + p->proc_fops = &proc_net_seq_fops; + p->seq_ops = ops; + p->state_size = state_size; + p->write = write; + return proc_register(parent, p); +} +EXPORT_SYMBOL_GPL(proc_create_net_data_write); + static int single_open_net(struct inode *inode, struct file *file) { struct proc_dir_entry *de = PDE(inode); @@ -119,6 +167,7 @@ static int single_release_net(struct inode *ino, struct file *f) static const struct file_operations proc_net_single_fops = { .open = single_open_net, .read = seq_read, + .write = proc_simple_write, .llseek = seq_lseek, .release = single_release_net, }; @@ -138,6 +187,49 @@ struct proc_dir_entry *proc_create_net_single(const char *name, umode_t mode, } EXPORT_SYMBOL_GPL(proc_create_net_single); +/** + * proc_create_net_single_write - Create a writable net_ns-specific proc file + * @name: The name of the file. + * @mode: The file's access mode. + * @parent: The parent directory in which to create. + * @show: The seqfile show method with which to read the file. + * @write: The write method which which to 'modify' the file. + * @data: Data for retrieval by PDE_DATA(). + * + * Create a network-namespaced proc file in the @parent directory with the + * specified @name and @mode that allows reading of a file that displays a + * single element rather than a series and also provides for the file accepting + * writes that have some arbitrary effect. + * + * The @show function is called to extract the readable content via the + * seq_file interface. + * + * The @write function is called with the data copied into a kernel space + * scratch buffer and has a NUL appended for convenience. The buffer may be + * modified by the @write function. @write should return 0 on success. + * + * The @data value is accessible from the @show and @write functions by calling + * PDE_DATA() on the file inode. The network namespace must be accessed by + * calling seq_file_single_net() on the seq_file struct. + */ +struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mode, + struct proc_dir_entry *parent, + int (*show)(struct seq_file *, void *), + proc_write_t write, + void *data) +{ + struct proc_dir_entry *p; + + p = proc_create_reg(name, mode, &parent, data); + if (!p) + return NULL; + p->proc_fops = &proc_net_single_fops; + p->single_show = show; + p->write = write; + return proc_register(parent, p); +} +EXPORT_SYMBOL_GPL(proc_create_net_single_write); + static struct net *get_proc_task_net(struct inode *dir) { struct task_struct *task; diff --git a/fs/proc/root.c b/fs/proc/root.c index 61b7340b357a..f4b1a9d2eca6 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -204,8 +204,7 @@ struct proc_dir_entry proc_root = { .proc_fops = &proc_root_operations, .parent = &proc_root, .subdir = RB_ROOT, - .name = proc_root.inline_name, - .inline_name = "/proc", + .name = "/proc", }; int pid_ns_prepare_proc(struct pid_namespace *ns) diff --git a/fs/signalfd.c b/fs/signalfd.c index cbb42f77a2bd..4fcd1498acf5 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -259,10 +259,8 @@ static const struct file_operations signalfd_fops = { .llseek = noop_llseek, }; -static int do_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask, - int flags) +static int do_signalfd4(int ufd, sigset_t *mask, int flags) { - sigset_t sigmask; struct signalfd_ctx *ctx; /* Check the SFD_* constants for consistency. */ @@ -272,18 +270,15 @@ static int do_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask, if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK)) return -EINVAL; - if (sizemask != sizeof(sigset_t) || - copy_from_user(&sigmask, user_mask, sizeof(sigmask))) - return -EINVAL; - sigdelsetmask(&sigmask, sigmask(SIGKILL) | sigmask(SIGSTOP)); - signotset(&sigmask); + sigdelsetmask(mask, sigmask(SIGKILL) | sigmask(SIGSTOP)); + signotset(mask); if (ufd == -1) { ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) return -ENOMEM; - ctx->sigmask = sigmask; + ctx->sigmask = *mask; /* * When we call this, the initialization must be complete, since @@ -303,7 +298,7 @@ static int do_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask, return -EINVAL; } spin_lock_irq(¤t->sighand->siglock); - ctx->sigmask = sigmask; + ctx->sigmask = *mask; spin_unlock_irq(¤t->sighand->siglock); wake_up(¤t->sighand->signalfd_wqh); @@ -316,46 +311,51 @@ static int do_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask, SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask, size_t, sizemask, int, flags) { - return do_signalfd4(ufd, user_mask, sizemask, flags); + sigset_t mask; + + if (sizemask != sizeof(sigset_t) || + copy_from_user(&mask, user_mask, sizeof(mask))) + return -EINVAL; + return do_signalfd4(ufd, &mask, flags); } SYSCALL_DEFINE3(signalfd, int, ufd, sigset_t __user *, user_mask, size_t, sizemask) { - return do_signalfd4(ufd, user_mask, sizemask, 0); + sigset_t mask; + + if (sizemask != sizeof(sigset_t) || + copy_from_user(&mask, user_mask, sizeof(mask))) + return -EINVAL; + return do_signalfd4(ufd, &mask, 0); } #ifdef CONFIG_COMPAT static long do_compat_signalfd4(int ufd, - const compat_sigset_t __user *sigmask, + const compat_sigset_t __user *user_mask, compat_size_t sigsetsize, int flags) { - sigset_t tmp; - sigset_t __user *ksigmask; + sigset_t mask; if (sigsetsize != sizeof(compat_sigset_t)) return -EINVAL; - if (get_compat_sigset(&tmp, sigmask)) - return -EFAULT; - ksigmask = compat_alloc_user_space(sizeof(sigset_t)); - if (copy_to_user(ksigmask, &tmp, sizeof(sigset_t))) + if (get_compat_sigset(&mask, user_mask)) return -EFAULT; - - return do_signalfd4(ufd, ksigmask, sizeof(sigset_t), flags); + return do_signalfd4(ufd, &mask, flags); } COMPAT_SYSCALL_DEFINE4(signalfd4, int, ufd, - const compat_sigset_t __user *, sigmask, + const compat_sigset_t __user *, user_mask, compat_size_t, sigsetsize, int, flags) { - return do_compat_signalfd4(ufd, sigmask, sigsetsize, flags); + return do_compat_signalfd4(ufd, user_mask, sigsetsize, flags); } COMPAT_SYSCALL_DEFINE3(signalfd, int, ufd, - const compat_sigset_t __user *,sigmask, + const compat_sigset_t __user *, user_mask, compat_size_t, sigsetsize) { - return do_compat_signalfd4(ufd, sigmask, sigsetsize, 0); + return do_compat_signalfd4(ufd, user_mask, sigsetsize, 0); } #endif diff --git a/fs/splice.c b/fs/splice.c index 2365ab073a27..b3daa971f597 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1243,38 +1243,26 @@ static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf, * For lack of a better implementation, implement vmsplice() to userspace * as a simple copy of the pipes pages to the user iov. */ -static long vmsplice_to_user(struct file *file, const struct iovec __user *uiov, - unsigned long nr_segs, unsigned int flags) +static long vmsplice_to_user(struct file *file, struct iov_iter *iter, + unsigned int flags) { - struct pipe_inode_info *pipe; - struct splice_desc sd; - long ret; - struct iovec iovstack[UIO_FASTIOV]; - struct iovec *iov = iovstack; - struct iov_iter iter; + struct pipe_inode_info *pipe = get_pipe_info(file); + struct splice_desc sd = { + .total_len = iov_iter_count(iter), + .flags = flags, + .u.data = iter + }; + long ret = 0; - pipe = get_pipe_info(file); if (!pipe) return -EBADF; - ret = import_iovec(READ, uiov, nr_segs, - ARRAY_SIZE(iovstack), &iov, &iter); - if (ret < 0) - return ret; - - sd.total_len = iov_iter_count(&iter); - sd.len = 0; - sd.flags = flags; - sd.u.data = &iter; - sd.pos = 0; - if (sd.total_len) { pipe_lock(pipe); ret = __splice_from_pipe(pipe, &sd, pipe_to_user); pipe_unlock(pipe); } - kfree(iov); return ret; } @@ -1283,14 +1271,11 @@ static long vmsplice_to_user(struct file *file, const struct iovec __user *uiov, * as splice-from-memory, where the regular splice is splice-from-file (or * to file). In both cases the output is a pipe, naturally. */ -static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov, - unsigned long nr_segs, unsigned int flags) +static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter, + unsigned int flags) { struct pipe_inode_info *pipe; - struct iovec iovstack[UIO_FASTIOV]; - struct iovec *iov = iovstack; - struct iov_iter from; - long ret; + long ret = 0; unsigned buf_flag = 0; if (flags & SPLICE_F_GIFT) @@ -1300,22 +1285,31 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov, if (!pipe) return -EBADF; - ret = import_iovec(WRITE, uiov, nr_segs, - ARRAY_SIZE(iovstack), &iov, &from); - if (ret < 0) - return ret; - pipe_lock(pipe); ret = wait_for_space(pipe, flags); if (!ret) - ret = iter_to_pipe(&from, pipe, buf_flag); + ret = iter_to_pipe(iter, pipe, buf_flag); pipe_unlock(pipe); if (ret > 0) wakeup_pipe_readers(pipe); - kfree(iov); return ret; } +static int vmsplice_type(struct fd f, int *type) +{ + if (!f.file) + return -EBADF; + if (f.file->f_mode & FMODE_WRITE) { + *type = WRITE; + } else if (f.file->f_mode & FMODE_READ) { + *type = READ; + } else { + fdput(f); + return -EBADF; + } + return 0; +} + /* * Note that vmsplice only really supports true splicing _from_ user memory * to a pipe, not the other way around. Splicing from user memory is a simple @@ -1332,57 +1326,69 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *uiov, * Currently we punt and implement it as a normal copy, see pipe_to_user(). * */ -static long do_vmsplice(int fd, const struct iovec __user *iov, - unsigned long nr_segs, unsigned int flags) +static long do_vmsplice(struct file *f, struct iov_iter *iter, unsigned int flags) { - struct fd f; - long error; - if (unlikely(flags & ~SPLICE_F_ALL)) return -EINVAL; - if (unlikely(nr_segs > UIO_MAXIOV)) - return -EINVAL; - else if (unlikely(!nr_segs)) - return 0; - error = -EBADF; - f = fdget(fd); - if (f.file) { - if (f.file->f_mode & FMODE_WRITE) - error = vmsplice_to_pipe(f.file, iov, nr_segs, flags); - else if (f.file->f_mode & FMODE_READ) - error = vmsplice_to_user(f.file, iov, nr_segs, flags); - - fdput(f); - } + if (!iov_iter_count(iter)) + return 0; - return error; + if (iov_iter_rw(iter) == WRITE) + return vmsplice_to_pipe(f, iter, flags); + else + return vmsplice_to_user(f, iter, flags); } -SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, iov, +SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov, unsigned long, nr_segs, unsigned int, flags) { - return do_vmsplice(fd, iov, nr_segs, flags); + struct iovec iovstack[UIO_FASTIOV]; + struct iovec *iov = iovstack; + struct iov_iter iter; + long error; + struct fd f; + int type; + + f = fdget(fd); + error = vmsplice_type(f, &type); + if (error) + return error; + + error = import_iovec(type, uiov, nr_segs, + ARRAY_SIZE(iovstack), &iov, &iter); + if (!error) { + error = do_vmsplice(f.file, &iter, flags); + kfree(iov); + } + fdput(f); + return error; } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, iov32, unsigned int, nr_segs, unsigned int, flags) { - unsigned i; - struct iovec __user *iov; - if (nr_segs > UIO_MAXIOV) - return -EINVAL; - iov = compat_alloc_user_space(nr_segs * sizeof(struct iovec)); - for (i = 0; i < nr_segs; i++) { - struct compat_iovec v; - if (get_user(v.iov_base, &iov32[i].iov_base) || - get_user(v.iov_len, &iov32[i].iov_len) || - put_user(compat_ptr(v.iov_base), &iov[i].iov_base) || - put_user(v.iov_len, &iov[i].iov_len)) - return -EFAULT; + struct iovec iovstack[UIO_FASTIOV]; + struct iovec *iov = iovstack; + struct iov_iter iter; + long error; + struct fd f; + int type; + + f = fdget(fd); + error = vmsplice_type(f, &type); + if (error) + return error; + + error = compat_import_iovec(type, iov32, nr_segs, + ARRAY_SIZE(iovstack), &iov, &iter); + if (!error) { + error = do_vmsplice(f.file, &iter, flags); + kfree(iov); } - return do_vmsplice(fd, iov, nr_segs, flags); + fdput(f); + return error; } #endif diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index e64c0294f50b..b38964a7a521 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -98,8 +98,6 @@ struct fsnotify_iter_info; struct fsnotify_ops { int (*handle_event)(struct fsnotify_group *group, struct inode *inode, - struct fsnotify_mark *inode_mark, - struct fsnotify_mark *vfsmount_mark, u32 mask, const void *data, int data_type, const unsigned char *file_name, u32 cookie, struct fsnotify_iter_info *iter_info); @@ -201,6 +199,57 @@ struct fsnotify_group { #define FSNOTIFY_EVENT_PATH 1 #define FSNOTIFY_EVENT_INODE 2 +enum fsnotify_obj_type { + FSNOTIFY_OBJ_TYPE_INODE, + FSNOTIFY_OBJ_TYPE_VFSMOUNT, + FSNOTIFY_OBJ_TYPE_COUNT, + FSNOTIFY_OBJ_TYPE_DETACHED = FSNOTIFY_OBJ_TYPE_COUNT +}; + +#define FSNOTIFY_OBJ_TYPE_INODE_FL (1U << FSNOTIFY_OBJ_TYPE_INODE) +#define FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL (1U << FSNOTIFY_OBJ_TYPE_VFSMOUNT) +#define FSNOTIFY_OBJ_ALL_TYPES_MASK ((1U << FSNOTIFY_OBJ_TYPE_COUNT) - 1) + +struct fsnotify_iter_info { + struct fsnotify_mark *marks[FSNOTIFY_OBJ_TYPE_COUNT]; + unsigned int report_mask; + int srcu_idx; +}; + +static inline bool fsnotify_iter_should_report_type( + struct fsnotify_iter_info *iter_info, int type) +{ + return (iter_info->report_mask & (1U << type)); +} + +static inline void fsnotify_iter_set_report_type( + struct fsnotify_iter_info *iter_info, int type) +{ + iter_info->report_mask |= (1U << type); +} + +static inline void fsnotify_iter_set_report_type_mark( + struct fsnotify_iter_info *iter_info, int type, + struct fsnotify_mark *mark) +{ + iter_info->marks[type] = mark; + iter_info->report_mask |= (1U << type); +} + +#define FSNOTIFY_ITER_FUNCS(name, NAME) \ +static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ + struct fsnotify_iter_info *iter_info) \ +{ \ + return (iter_info->report_mask & FSNOTIFY_OBJ_TYPE_##NAME##_FL) ? \ + iter_info->marks[FSNOTIFY_OBJ_TYPE_##NAME] : NULL; \ +} + +FSNOTIFY_ITER_FUNCS(inode, INODE) +FSNOTIFY_ITER_FUNCS(vfsmount, VFSMOUNT) + +#define fsnotify_foreach_obj_type(type) \ + for (type = 0; type < FSNOTIFY_OBJ_TYPE_COUNT; type++) + /* * Inode / vfsmount point to this structure which tracks all marks attached to * the inode / vfsmount. The reference to inode / vfsmount is held by this @@ -209,11 +258,7 @@ struct fsnotify_group { */ struct fsnotify_mark_connector { spinlock_t lock; -#define FSNOTIFY_OBJ_TYPE_INODE 0x01 -#define FSNOTIFY_OBJ_TYPE_VFSMOUNT 0x02 -#define FSNOTIFY_OBJ_ALL_TYPES (FSNOTIFY_OBJ_TYPE_INODE | \ - FSNOTIFY_OBJ_TYPE_VFSMOUNT) - unsigned int flags; /* Type of object [lock] */ + unsigned int type; /* Type of object [lock] */ union { /* Object pointer [lock] */ struct inode *inode; struct vfsmount *mnt; @@ -356,7 +401,21 @@ extern struct fsnotify_mark *fsnotify_find_mark( extern int fsnotify_add_mark(struct fsnotify_mark *mark, struct inode *inode, struct vfsmount *mnt, int allow_dups); extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark, - struct inode *inode, struct vfsmount *mnt, int allow_dups); + struct inode *inode, struct vfsmount *mnt, + int allow_dups); +/* attach the mark to the inode */ +static inline int fsnotify_add_inode_mark(struct fsnotify_mark *mark, + struct inode *inode, + int allow_dups) +{ + return fsnotify_add_mark(mark, inode, NULL, allow_dups); +} +static inline int fsnotify_add_inode_mark_locked(struct fsnotify_mark *mark, + struct inode *inode, + int allow_dups) +{ + return fsnotify_add_mark_locked(mark, inode, NULL, allow_dups); +} /* given a group and a mark, flag mark to be freed when all references are dropped */ extern void fsnotify_destroy_mark(struct fsnotify_mark *mark, struct fsnotify_group *group); @@ -369,12 +428,12 @@ extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group, unsigned /* run all the marks in a group, and clear all of the vfsmount marks */ static inline void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_VFSMOUNT); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL); } /* run all the marks in a group, and clear all of the inode marks */ static inline void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_INODE); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_INODE_FL); } extern void fsnotify_get_mark(struct fsnotify_mark *mark); extern void fsnotify_put_mark(struct fsnotify_mark *mark); diff --git a/include/linux/namei.h b/include/linux/namei.h index a982bb7cd480..a78606e8e3df 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -81,6 +81,7 @@ extern void done_path_create(struct path *, struct dentry *); extern struct dentry *kern_path_locked(const char *, struct path *); extern int kern_path_mountpoint(int, const char *, struct path *, unsigned int); +extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int); extern struct dentry *lookup_one_len(const char *, struct dentry *, int); extern struct dentry *lookup_one_len_unlocked(const char *, struct dentry *, int); diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index 04551af2ff23..dd2052f0efb7 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -345,7 +345,7 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family) rcu_read_lock(); nat_hook = rcu_dereference(nf_nat_hook); - if (nat_hook->decode_session) + if (nat_hook && nat_hook->decode_session) nat_hook->decode_session(skb, fl); rcu_read_unlock(); #endif diff --git a/include/linux/netfilter/ipset/ip_set_timeout.h b/include/linux/netfilter/ipset/ip_set_timeout.h index bfb3531fd88a..8ce271e187b6 100644 --- a/include/linux/netfilter/ipset/ip_set_timeout.h +++ b/include/linux/netfilter/ipset/ip_set_timeout.h @@ -23,6 +23,9 @@ /* Set is defined with timeout support: timeout value may be 0 */ #define IPSET_NO_TIMEOUT UINT_MAX +/* Max timeout value, see msecs_to_jiffies() in jiffies.h */ +#define IPSET_MAX_TIMEOUT (UINT_MAX >> 1)/MSEC_PER_SEC + #define ip_set_adt_opt_timeout(opt, set) \ ((opt)->ext.timeout != IPSET_NO_TIMEOUT ? (opt)->ext.timeout : (set)->timeout) @@ -32,11 +35,10 @@ ip_set_timeout_uget(struct nlattr *tb) unsigned int timeout = ip_set_get_h32(tb); /* Normalize to fit into jiffies */ - if (timeout > UINT_MAX/MSEC_PER_SEC) - timeout = UINT_MAX/MSEC_PER_SEC; + if (timeout > IPSET_MAX_TIMEOUT) + timeout = IPSET_MAX_TIMEOUT; - /* Userspace supplied TIMEOUT parameter: adjust crazy size */ - return timeout == IPSET_NO_TIMEOUT ? IPSET_NO_TIMEOUT - 1 : timeout; + return timeout; } static inline bool @@ -65,8 +67,14 @@ ip_set_timeout_set(unsigned long *timeout, u32 value) static inline u32 ip_set_timeout_get(const unsigned long *timeout) { - return *timeout == IPSET_ELEM_PERMANENT ? 0 : - jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC; + u32 t; + + if (*timeout == IPSET_ELEM_PERMANENT) + return 0; + + t = jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC; + /* Zero value in userspace means no timeout */ + return t == 0 ? 1 : t; } #endif /* __KERNEL__ */ diff --git a/include/linux/platform_data/shmob_drm.h b/include/linux/platform_data/shmob_drm.h index 7c686d335c12..ee495d707f17 100644 --- a/include/linux/platform_data/shmob_drm.h +++ b/include/linux/platform_data/shmob_drm.h @@ -18,9 +18,6 @@ #include <drm/drm_mode.h> -struct sh_mobile_meram_cfg; -struct sh_mobile_meram_info; - enum shmob_drm_clk_source { SHMOB_DRM_CLK_BUS, SHMOB_DRM_CLK_PERIPHERAL, @@ -93,7 +90,6 @@ struct shmob_drm_platform_data { struct shmob_drm_interface_data iface; struct shmob_drm_panel_data panel; struct shmob_drm_backlight_data backlight; - const struct sh_mobile_meram_cfg *meram; }; #endif /* __SHMOB_DRM_H__ */ diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index e518352137e7..626fc65c4336 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -14,6 +14,8 @@ struct seq_operations; #ifdef CONFIG_PROC_FS +typedef int (*proc_write_t)(struct file *, char *, size_t); + extern void proc_root_init(void); extern void proc_flush_task(struct task_struct *); @@ -61,6 +63,16 @@ struct proc_dir_entry *proc_create_net_data(const char *name, umode_t mode, struct proc_dir_entry *proc_create_net_single(const char *name, umode_t mode, struct proc_dir_entry *parent, int (*show)(struct seq_file *, void *), void *data); +struct proc_dir_entry *proc_create_net_data_write(const char *name, umode_t mode, + struct proc_dir_entry *parent, + const struct seq_operations *ops, + proc_write_t write, + unsigned int state_size, void *data); +struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mode, + struct proc_dir_entry *parent, + int (*show)(struct seq_file *, void *), + proc_write_t write, + void *data); #else /* CONFIG_PROC_FS */ diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index bbf32524ab27..fab02133a919 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -35,7 +35,7 @@ static inline void virtio_rmb(bool weak_barriers) if (weak_barriers) virt_rmb(); else - rmb(); + dma_rmb(); } static inline void virtio_wmb(bool weak_barriers) @@ -43,7 +43,7 @@ static inline void virtio_wmb(bool weak_barriers) if (weak_barriers) virt_wmb(); else - wmb(); + dma_wmb(); } static inline void virtio_store_mb(bool weak_barriers, diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 6d6e21dee462..a0bec23c6d5e 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -631,6 +631,7 @@ struct ip_vs_service { /* alternate persistence engine */ struct ip_vs_pe __rcu *pe; + int conntrack_afmask; struct rcu_head rcu_head; }; @@ -1611,6 +1612,35 @@ static inline bool ip_vs_conn_uses_conntrack(struct ip_vs_conn *cp, return false; } +static inline int ip_vs_register_conntrack(struct ip_vs_service *svc) +{ +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + int afmask = (svc->af == AF_INET6) ? 2 : 1; + int ret = 0; + + if (!(svc->conntrack_afmask & afmask)) { + ret = nf_ct_netns_get(svc->ipvs->net, svc->af); + if (ret >= 0) + svc->conntrack_afmask |= afmask; + } + return ret; +#else + return 0; +#endif +} + +static inline void ip_vs_unregister_conntrack(struct ip_vs_service *svc) +{ +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + int afmask = (svc->af == AF_INET6) ? 2 : 1; + + if (svc->conntrack_afmask & afmask) { + nf_ct_netns_put(svc->ipvs->net, svc->af); + svc->conntrack_afmask &= ~afmask; + } +#endif +} + static inline int ip_vs_dest_conn_overhead(struct ip_vs_dest *dest) { diff --git a/include/net/netfilter/nf_conntrack_count.h b/include/net/netfilter/nf_conntrack_count.h index 1910b6572430..3a188a0923a3 100644 --- a/include/net/netfilter/nf_conntrack_count.h +++ b/include/net/netfilter/nf_conntrack_count.h @@ -20,7 +20,8 @@ unsigned int nf_conncount_lookup(struct net *net, struct hlist_head *head, bool *addit); bool nf_conncount_add(struct hlist_head *head, - const struct nf_conntrack_tuple *tuple); + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_zone *zone); void nf_conncount_cache_free(struct hlist_head *hhead); diff --git a/include/net/netfilter/nft_dup.h b/include/net/netfilter/nft_dup.h deleted file mode 100644 index 4d9d512984b2..000000000000 --- a/include/net/netfilter/nft_dup.h +++ /dev/null @@ -1,10 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _NFT_DUP_H_ -#define _NFT_DUP_H_ - -struct nft_dup_inet { - enum nft_registers sreg_addr:8; - enum nft_registers sreg_dev:8; -}; - -#endif /* _NFT_DUP_H_ */ diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index ebf809eed33a..dbe1b911a24d 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1133,6 +1133,11 @@ struct sctp_input_cb { }; #define SCTP_INPUT_CB(__skb) ((struct sctp_input_cb *)&((__skb)->cb[0])) +struct sctp_output_cb { + struct sk_buff *last; +}; +#define SCTP_OUTPUT_CB(__skb) ((struct sctp_output_cb *)&((__skb)->cb[0])) + static inline const struct sk_buff *sctp_gso_headskb(const struct sk_buff *skb) { const struct sctp_chunk *chunk = SCTP_INPUT_CB(skb)->chunk; diff --git a/include/net/tls.h b/include/net/tls.h index 70c273777fe9..7f84ea3e217c 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -109,8 +109,7 @@ struct tls_sw_context_rx { struct strparser strp; void (*saved_data_ready)(struct sock *sk); - unsigned int (*sk_poll)(struct file *file, struct socket *sock, - struct poll_table_struct *wait); + __poll_t (*sk_poll_mask)(struct socket *sock, __poll_t events); struct sk_buff *recv_pkt; u8 control; bool decrypted; @@ -225,8 +224,7 @@ void tls_sw_free_resources_tx(struct sock *sk); void tls_sw_free_resources_rx(struct sock *sk); int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len); -unsigned int tls_sw_poll(struct file *file, struct socket *sock, - struct poll_table_struct *wait); +__poll_t tls_sw_poll_mask(struct socket *sock, __poll_t events); ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 75846164290e..d00221345c19 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -109,7 +109,7 @@ struct iocb { #undef IFLITTLE struct __aio_sigset { - sigset_t __user *sigmask; + const sigset_t __user *sigmask; size_t sigsetsize; }; diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h index c712eb6879f1..336014bf8868 100644 --- a/include/uapi/linux/netfilter/nf_conntrack_common.h +++ b/include/uapi/linux/netfilter/nf_conntrack_common.h @@ -112,7 +112,7 @@ enum ip_conntrack_status { IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING | IPS_SEQ_ADJUST | IPS_TEMPLATE | IPS_OFFLOAD), - __IPS_MAX_BIT = 14, + __IPS_MAX_BIT = 15, }; /* Connection tracking event types */ diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index c9bf74b94f37..89438e68dc03 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -266,7 +266,7 @@ enum nft_rule_compat_attributes { * @NFT_SET_INTERVAL: set contains intervals * @NFT_SET_MAP: set is used as a dictionary * @NFT_SET_TIMEOUT: set uses timeouts - * @NFT_SET_EVAL: set contains expressions for evaluation + * @NFT_SET_EVAL: set can be updated from the evaluation path * @NFT_SET_OBJECT: set contains stateful objects */ enum nft_set_flags { diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h index 28b36545de24..27e4e441caac 100644 --- a/include/uapi/linux/nl80211.h +++ b/include/uapi/linux/nl80211.h @@ -981,18 +981,18 @@ * only the %NL80211_ATTR_IE data is used and updated with this command. * * @NL80211_CMD_SET_PMK: For offloaded 4-Way handshake, set the PMK or PMK-R0 - * for the given authenticator address (specified with &NL80211_ATTR_MAC). - * When &NL80211_ATTR_PMKR0_NAME is set, &NL80211_ATTR_PMK specifies the + * for the given authenticator address (specified with %NL80211_ATTR_MAC). + * When %NL80211_ATTR_PMKR0_NAME is set, %NL80211_ATTR_PMK specifies the * PMK-R0, otherwise it specifies the PMK. * @NL80211_CMD_DEL_PMK: For offloaded 4-Way handshake, delete the previously * configured PMK for the authenticator address identified by - * &NL80211_ATTR_MAC. + * %NL80211_ATTR_MAC. * @NL80211_CMD_PORT_AUTHORIZED: An event that indicates that the 4 way * handshake was completed successfully by the driver. The BSSID is - * specified with &NL80211_ATTR_MAC. Drivers that support 4 way handshake + * specified with %NL80211_ATTR_MAC. Drivers that support 4 way handshake * offload should send this event after indicating 802.11 association with - * &NL80211_CMD_CONNECT or &NL80211_CMD_ROAM. If the 4 way handshake failed - * &NL80211_CMD_DISCONNECT should be indicated instead. + * %NL80211_CMD_CONNECT or %NL80211_CMD_ROAM. If the 4 way handshake failed + * %NL80211_CMD_DISCONNECT should be indicated instead. * * @NL80211_CMD_CONTROL_PORT_FRAME: Control Port (e.g. PAE) frame TX request * and RX notification. This command is used both as a request to transmit @@ -1029,9 +1029,9 @@ * initiated the connection through the connect request. * * @NL80211_CMD_STA_OPMODE_CHANGED: An event that notify station's - * ht opmode or vht opmode changes using any of &NL80211_ATTR_SMPS_MODE, - * &NL80211_ATTR_CHANNEL_WIDTH,&NL80211_ATTR_NSS attributes with its - * address(specified in &NL80211_ATTR_MAC). + * ht opmode or vht opmode changes using any of %NL80211_ATTR_SMPS_MODE, + * %NL80211_ATTR_CHANNEL_WIDTH,%NL80211_ATTR_NSS attributes with its + * address(specified in %NL80211_ATTR_MAC). * * @NL80211_CMD_MAX: highest used command number * @__NL80211_CMD_AFTER_LAST: internal use @@ -2218,7 +2218,7 @@ enum nl80211_commands { * @NL80211_ATTR_EXTERNAL_AUTH_ACTION: Identify the requested external * authentication operation (u32 attribute with an * &enum nl80211_external_auth_action value). This is used with the - * &NL80211_CMD_EXTERNAL_AUTH request event. + * %NL80211_CMD_EXTERNAL_AUTH request event. * @NL80211_ATTR_EXTERNAL_AUTH_SUPPORT: Flag attribute indicating that the user * space supports external authentication. This attribute shall be used * only with %NL80211_CMD_CONNECT request. The driver may offload @@ -3491,7 +3491,7 @@ enum nl80211_sched_scan_match_attr { * @NL80211_RRF_AUTO_BW: maximum available bandwidth should be calculated * base on contiguous rules and wider channels will be allowed to cross * multiple contiguous/overlapping frequency ranges. - * @NL80211_RRF_IR_CONCURRENT: See &NL80211_FREQUENCY_ATTR_IR_CONCURRENT + * @NL80211_RRF_IR_CONCURRENT: See %NL80211_FREQUENCY_ATTR_IR_CONCURRENT * @NL80211_RRF_NO_HT40MINUS: channels can't be used in HT40- operation * @NL80211_RRF_NO_HT40PLUS: channels can't be used in HT40+ operation * @NL80211_RRF_NO_80MHZ: 80MHz operation not allowed @@ -5643,11 +5643,11 @@ enum nl80211_nan_func_attributes { * @NL80211_NAN_SRF_INCLUDE: present if the include bit of the SRF set. * This is a flag. * @NL80211_NAN_SRF_BF: Bloom Filter. Present if and only if - * &NL80211_NAN_SRF_MAC_ADDRS isn't present. This attribute is binary. + * %NL80211_NAN_SRF_MAC_ADDRS isn't present. This attribute is binary. * @NL80211_NAN_SRF_BF_IDX: index of the Bloom Filter. Mandatory if - * &NL80211_NAN_SRF_BF is present. This is a u8. + * %NL80211_NAN_SRF_BF is present. This is a u8. * @NL80211_NAN_SRF_MAC_ADDRS: list of MAC addresses for the SRF. Present if - * and only if &NL80211_NAN_SRF_BF isn't present. This is a nested + * and only if %NL80211_NAN_SRF_BF isn't present. This is a nested * attribute. Each nested attribute is a MAC address. * @NUM_NL80211_NAN_SRF_ATTR: internal * @NL80211_NAN_SRF_ATTR_MAX: highest NAN SRF attribute diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h index 308e2096291f..449132c76b1c 100644 --- a/include/uapi/linux/virtio_config.h +++ b/include/uapi/linux/virtio_config.h @@ -45,11 +45,14 @@ /* We've given up on this device. */ #define VIRTIO_CONFIG_S_FAILED 0x80 -/* Some virtio feature bits (currently bits 28 through 32) are reserved for the - * transport being used (eg. virtio_ring), the rest are per-device feature - * bits. */ +/* + * Virtio feature bits VIRTIO_TRANSPORT_F_START through + * VIRTIO_TRANSPORT_F_END are reserved for the transport + * being used (e.g. virtio_ring, virtio_pci etc.), the + * rest are per-device feature bits. + */ #define VIRTIO_TRANSPORT_F_START 28 -#define VIRTIO_TRANSPORT_F_END 34 +#define VIRTIO_TRANSPORT_F_END 38 #ifndef VIRTIO_CONFIG_NO_LEGACY /* Do we get callbacks when the ring is completely used, even if we've @@ -71,4 +74,9 @@ * this is for compatibility with legacy systems. */ #define VIRTIO_F_IOMMU_PLATFORM 33 + +/* + * Does the device support Single Root I/O Virtualization? + */ +#define VIRTIO_F_SR_IOV 37 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */ diff --git a/include/video/auo_k190xfb.h b/include/video/auo_k190xfb.h deleted file mode 100644 index ac329ee1d753..000000000000 --- a/include/video/auo_k190xfb.h +++ /dev/null @@ -1,107 +0,0 @@ -/* - * Definitions for AUO-K190X framebuffer drivers - * - * Copyright (C) 2012 Heiko Stuebner <heiko@sntech.de> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#ifndef _LINUX_VIDEO_AUO_K190XFB_H_ -#define _LINUX_VIDEO_AUO_K190XFB_H_ - -/* Controller standby command needs a param */ -#define AUOK190X_QUIRK_STANDBYPARAM (1 << 0) - -/* Controller standby is completely broken */ -#define AUOK190X_QUIRK_STANDBYBROKEN (1 << 1) - -/* - * Resolutions for the displays - */ -#define AUOK190X_RESOLUTION_800_600 0 -#define AUOK190X_RESOLUTION_1024_768 1 -#define AUOK190X_RESOLUTION_600_800 4 -#define AUOK190X_RESOLUTION_768_1024 5 - -/* - * struct used by auok190x. board specific stuff comes from *board - */ -struct auok190xfb_par { - struct fb_info *info; - struct auok190x_board *board; - - struct regulator *regulator; - - struct mutex io_lock; - struct delayed_work work; - wait_queue_head_t waitq; - int resolution; - int rotation; - int consecutive_threshold; - int update_cnt; - - /* panel and controller informations */ - int epd_type; - int panel_size_int; - int panel_size_float; - int panel_model; - int tcon_version; - int lut_version; - - /* individual controller callbacks */ - void (*update_partial)(struct auok190xfb_par *par, u16 y1, u16 y2); - void (*update_all)(struct auok190xfb_par *par); - bool (*need_refresh)(struct auok190xfb_par *par); - void (*init)(struct auok190xfb_par *par); - void (*recover)(struct auok190xfb_par *par); - - int update_mode; /* mode to use for updates */ - int last_mode; /* update mode last used */ - int flash; - - /* power management */ - int autosuspend_delay; - bool standby; - bool manual_standby; -}; - -/** - * Board specific platform-data - * @init: initialize the controller interface - * @cleanup: cleanup the controller interface - * @wait_for_rdy: wait until the controller is not busy anymore - * @set_ctl: change an interface control - * @set_hdb: write a value to the data register - * @get_hdb: read a value from the data register - * @setup_irq: method to setup the irq handling on the busy gpio - * @gpio_nsleep: sleep gpio - * @gpio_nrst: reset gpio - * @gpio_nbusy: busy gpio - * @resolution: one of the AUOK190X_RESOLUTION constants - * @rotation: rotation of the framebuffer - * @quirks: controller quirks to honor - * @fps: frames per second for defio - */ -struct auok190x_board { - int (*init)(struct auok190xfb_par *); - void (*cleanup)(struct auok190xfb_par *); - int (*wait_for_rdy)(struct auok190xfb_par *); - - void (*set_ctl)(struct auok190xfb_par *, unsigned char, u8); - void (*set_hdb)(struct auok190xfb_par *, u16); - u16 (*get_hdb)(struct auok190xfb_par *); - - int (*setup_irq)(struct fb_info *); - - int gpio_nsleep; - int gpio_nrst; - int gpio_nbusy; - - int resolution; - int quirks; - int fps; -}; - -#endif diff --git a/include/video/sh_mobile_lcdc.h b/include/video/sh_mobile_lcdc.h index f706b0fed399..84aa976ca4ea 100644 --- a/include/video/sh_mobile_lcdc.h +++ b/include/video/sh_mobile_lcdc.h @@ -3,7 +3,6 @@ #define __ASM_SH_MOBILE_LCDC_H__ #include <linux/fb.h> -#include <video/sh_mobile_meram.h> /* Register definitions */ #define _LDDCKR 0x410 @@ -184,7 +183,6 @@ struct sh_mobile_lcdc_chan_cfg { struct sh_mobile_lcdc_panel_cfg panel_cfg; struct sh_mobile_lcdc_bl_info bl_info; struct sh_mobile_lcdc_sys_bus_cfg sys_bus_cfg; /* only for SYSn I/F */ - const struct sh_mobile_meram_cfg *meram_cfg; struct platform_device *tx_dev; /* HDMI/DSI transmitter device */ }; @@ -193,7 +191,6 @@ struct sh_mobile_lcdc_info { int clock_source; struct sh_mobile_lcdc_chan_cfg ch[2]; struct sh_mobile_lcdc_overlay_cfg overlays[4]; - struct sh_mobile_meram_info *meram_dev; }; #endif /* __ASM_SH_MOBILE_LCDC_H__ */ diff --git a/include/video/sh_mobile_meram.h b/include/video/sh_mobile_meram.h deleted file mode 100644 index f4efc21e205d..000000000000 --- a/include/video/sh_mobile_meram.h +++ /dev/null @@ -1,95 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __VIDEO_SH_MOBILE_MERAM_H__ -#define __VIDEO_SH_MOBILE_MERAM_H__ - -/* For sh_mobile_meram_info.addr_mode */ -enum { - SH_MOBILE_MERAM_MODE0 = 0, - SH_MOBILE_MERAM_MODE1 -}; - -enum { - SH_MOBILE_MERAM_PF_NV = 0, - SH_MOBILE_MERAM_PF_RGB, - SH_MOBILE_MERAM_PF_NV24 -}; - - -struct sh_mobile_meram_priv; - -/* - * struct sh_mobile_meram_info - MERAM platform data - * @reserved_icbs: Bitmask of reserved ICBs (for instance used through UIO) - */ -struct sh_mobile_meram_info { - int addr_mode; - u32 reserved_icbs; - struct sh_mobile_meram_priv *priv; - struct platform_device *pdev; -}; - -/* icb config */ -struct sh_mobile_meram_icb_cfg { - unsigned int meram_size; /* MERAM Buffer Size to use */ -}; - -struct sh_mobile_meram_cfg { - struct sh_mobile_meram_icb_cfg icb[2]; -}; - -#if defined(CONFIG_FB_SH_MOBILE_MERAM) || \ - defined(CONFIG_FB_SH_MOBILE_MERAM_MODULE) -unsigned long sh_mobile_meram_alloc(struct sh_mobile_meram_info *meram_dev, - size_t size); -void sh_mobile_meram_free(struct sh_mobile_meram_info *meram_dev, - unsigned long mem, size_t size); -void *sh_mobile_meram_cache_alloc(struct sh_mobile_meram_info *dev, - const struct sh_mobile_meram_cfg *cfg, - unsigned int xres, unsigned int yres, - unsigned int pixelformat, - unsigned int *pitch); -void sh_mobile_meram_cache_free(struct sh_mobile_meram_info *dev, void *data); -void sh_mobile_meram_cache_update(struct sh_mobile_meram_info *dev, void *data, - unsigned long base_addr_y, - unsigned long base_addr_c, - unsigned long *icb_addr_y, - unsigned long *icb_addr_c); -#else -static inline unsigned long -sh_mobile_meram_alloc(struct sh_mobile_meram_info *meram_dev, size_t size) -{ - return 0; -} - -static inline void -sh_mobile_meram_free(struct sh_mobile_meram_info *meram_dev, - unsigned long mem, size_t size) -{ -} - -static inline void * -sh_mobile_meram_cache_alloc(struct sh_mobile_meram_info *dev, - const struct sh_mobile_meram_cfg *cfg, - unsigned int xres, unsigned int yres, - unsigned int pixelformat, - unsigned int *pitch) -{ - return ERR_PTR(-ENODEV); -} - -static inline void -sh_mobile_meram_cache_free(struct sh_mobile_meram_info *dev, void *data) -{ -} - -static inline void -sh_mobile_meram_cache_update(struct sh_mobile_meram_info *dev, void *data, - unsigned long base_addr_y, - unsigned long base_addr_c, - unsigned long *icb_addr_y, - unsigned long *icb_addr_c) -{ -} -#endif - -#endif /* __VIDEO_SH_MOBILE_MERAM_H__ */ diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index 52f368b6561e..fba78047fb37 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -109,7 +109,7 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa audit_update_mark(audit_mark, dentry->d_inode); audit_mark->rule = krule; - ret = fsnotify_add_mark(&audit_mark->mark, inode, NULL, true); + ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, true); if (ret < 0) { fsnotify_put_mark(&audit_mark->mark); audit_mark = ERR_PTR(ret); @@ -165,12 +165,11 @@ static void audit_autoremove_mark_rule(struct audit_fsnotify_mark *audit_mark) /* Update mark data in audit rules based on fsnotify events. */ static int audit_mark_handle_event(struct fsnotify_group *group, struct inode *to_tell, - struct fsnotify_mark *inode_mark, - struct fsnotify_mark *vfsmount_mark, u32 mask, const void *data, int data_type, const unsigned char *dname, u32 cookie, struct fsnotify_iter_info *iter_info) { + struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); struct audit_fsnotify_mark *audit_mark; const struct inode *inode = NULL; diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c index 67e6956c0b61..c99ebaae5abc 100644 --- a/kernel/audit_tree.c +++ b/kernel/audit_tree.c @@ -288,8 +288,8 @@ static void untag_chunk(struct node *p) if (!new) goto Fallback; - if (fsnotify_add_mark_locked(&new->mark, entry->connector->inode, - NULL, 1)) { + if (fsnotify_add_inode_mark_locked(&new->mark, entry->connector->inode, + 1)) { fsnotify_put_mark(&new->mark); goto Fallback; } @@ -354,7 +354,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree) return -ENOMEM; entry = &chunk->mark; - if (fsnotify_add_mark(entry, inode, NULL, 0)) { + if (fsnotify_add_inode_mark(entry, inode, 0)) { fsnotify_put_mark(entry); return -ENOSPC; } @@ -434,8 +434,8 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree) return -ENOENT; } - if (fsnotify_add_mark_locked(chunk_entry, - old_entry->connector->inode, NULL, 1)) { + if (fsnotify_add_inode_mark_locked(chunk_entry, + old_entry->connector->inode, 1)) { spin_unlock(&old_entry->lock); mutex_unlock(&old_entry->group->mark_mutex); fsnotify_put_mark(chunk_entry); @@ -989,8 +989,6 @@ static void evict_chunk(struct audit_chunk *chunk) static int audit_tree_handle_event(struct fsnotify_group *group, struct inode *to_tell, - struct fsnotify_mark *inode_mark, - struct fsnotify_mark *vfsmount_mark, u32 mask, const void *data, int data_type, const unsigned char *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index f1ba88994508..c17c0c268436 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -160,7 +160,7 @@ static struct audit_parent *audit_init_parent(struct path *path) fsnotify_init_mark(&parent->mark, audit_watch_group); parent->mark.mask = AUDIT_FS_WATCH; - ret = fsnotify_add_mark(&parent->mark, inode, NULL, 0); + ret = fsnotify_add_inode_mark(&parent->mark, inode, 0); if (ret < 0) { audit_free_parent(parent); return ERR_PTR(ret); @@ -472,12 +472,11 @@ void audit_remove_watch_rule(struct audit_krule *krule) /* Update watch data in audit rules based on fsnotify events. */ static int audit_watch_handle_event(struct fsnotify_group *group, struct inode *to_tell, - struct fsnotify_mark *inode_mark, - struct fsnotify_mark *vfsmount_mark, u32 mask, const void *data, int data_type, const unsigned char *dname, u32 cookie, struct fsnotify_iter_info *iter_info) { + struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); const struct inode *inode; struct audit_parent *parent; diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index ed13645bd80c..76efe9a183f5 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -295,6 +295,15 @@ static const struct file_operations bpffs_map_fops = { .release = bpffs_map_release, }; +static int bpffs_obj_open(struct inode *inode, struct file *file) +{ + return -EIO; +} + +static const struct file_operations bpffs_obj_fops = { + .open = bpffs_obj_open, +}; + static int bpf_mkobj_ops(struct dentry *dentry, umode_t mode, void *raw, const struct inode_operations *iops, const struct file_operations *fops) @@ -314,7 +323,8 @@ static int bpf_mkobj_ops(struct dentry *dentry, umode_t mode, void *raw, static int bpf_mkprog(struct dentry *dentry, umode_t mode, void *arg) { - return bpf_mkobj_ops(dentry, mode, arg, &bpf_prog_iops, NULL); + return bpf_mkobj_ops(dentry, mode, arg, &bpf_prog_iops, + &bpffs_obj_fops); } static int bpf_mkmap(struct dentry *dentry, umode_t mode, void *arg) @@ -322,7 +332,7 @@ static int bpf_mkmap(struct dentry *dentry, umode_t mode, void *arg) struct bpf_map *map = arg; return bpf_mkobj_ops(dentry, mode, arg, &bpf_map_iops, - map->btf ? &bpffs_map_fops : NULL); + map->btf ? &bpffs_map_fops : &bpffs_obj_fops); } static struct dentry * diff --git a/kernel/module.c b/kernel/module.c index 68469b37d61a..f475f30eed8c 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -274,9 +274,7 @@ static void module_assert_mutex_or_preempt(void) } static bool sig_enforce = IS_ENABLED(CONFIG_MODULE_SIG_FORCE); -#ifndef CONFIG_MODULE_SIG_FORCE module_param(sig_enforce, bool_enable_only, 0644); -#endif /* !CONFIG_MODULE_SIG_FORCE */ /* * Export sig_enforce kernel cmdline parameter to allow other subsystems rely @@ -2785,7 +2783,7 @@ static int module_sig_check(struct load_info *info, int flags) } /* Not having a signature is only an error if we're strict. */ - if (err == -ENOKEY && !sig_enforce) + if (err == -ENOKEY && !is_module_sig_enforced()) err = 0; return err; diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index 684b66bfa199..491828713e0b 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -411,6 +411,12 @@ ebt_check_watcher(struct ebt_entry_watcher *w, struct xt_tgchk_param *par, watcher = xt_request_find_target(NFPROTO_BRIDGE, w->u.name, 0); if (IS_ERR(watcher)) return PTR_ERR(watcher); + + if (watcher->family != NFPROTO_BRIDGE) { + module_put(watcher->me); + return -ENOENT; + } + w->u.watcher = watcher; par->target = watcher; @@ -709,6 +715,8 @@ ebt_check_entry(struct ebt_entry *e, struct net *net, } i = 0; + memset(&mtpar, 0, sizeof(mtpar)); + memset(&tgpar, 0, sizeof(tgpar)); mtpar.net = tgpar.net = net; mtpar.table = tgpar.table = name; mtpar.entryinfo = tgpar.entryinfo = e; @@ -730,6 +738,13 @@ ebt_check_entry(struct ebt_entry *e, struct net *net, goto cleanup_watchers; } + /* Reject UNSPEC, xtables verdicts/return values are incompatible */ + if (target->family != NFPROTO_BRIDGE) { + module_put(target->me); + ret = -ENOENT; + goto cleanup_watchers; + } + t->u.target = target; if (t->u.target == &ebt_standard_target) { if (gap < sizeof(struct ebt_standard_target)) { @@ -1606,16 +1621,16 @@ struct compat_ebt_entry_mwt { compat_uptr_t ptr; } u; compat_uint_t match_size; - compat_uint_t data[0]; + compat_uint_t data[0] __attribute__ ((aligned (__alignof__(struct compat_ebt_replace)))); }; /* account for possible padding between match_size and ->data */ static int ebt_compat_entry_padsize(void) { - BUILD_BUG_ON(XT_ALIGN(sizeof(struct ebt_entry_match)) < - COMPAT_XT_ALIGN(sizeof(struct compat_ebt_entry_mwt))); - return (int) XT_ALIGN(sizeof(struct ebt_entry_match)) - - COMPAT_XT_ALIGN(sizeof(struct compat_ebt_entry_mwt)); + BUILD_BUG_ON(sizeof(struct ebt_entry_match) < + sizeof(struct compat_ebt_entry_mwt)); + return (int) sizeof(struct ebt_entry_match) - + sizeof(struct compat_ebt_entry_mwt); } static int ebt_compat_match_offset(const struct xt_match *match, diff --git a/net/bridge/netfilter/nft_reject_bridge.c b/net/bridge/netfilter/nft_reject_bridge.c index eaf05de37f75..6de981270566 100644 --- a/net/bridge/netfilter/nft_reject_bridge.c +++ b/net/bridge/netfilter/nft_reject_bridge.c @@ -261,7 +261,7 @@ static void nft_reject_br_send_v6_unreach(struct net *net, if (!reject6_br_csum_ok(oldskb, hook)) return; - nskb = alloc_skb(sizeof(struct iphdr) + sizeof(struct icmp6hdr) + + nskb = alloc_skb(sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) + LL_MAX_HEADER + len, GFP_ATOMIC); if (!nskb) return; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index a7a9c3d738ba..8e3fda9e725c 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -119,13 +119,14 @@ unsigned long neigh_rand_reach_time(unsigned long base) EXPORT_SYMBOL(neigh_rand_reach_time); -static bool neigh_del(struct neighbour *n, __u8 state, +static bool neigh_del(struct neighbour *n, __u8 state, __u8 flags, struct neighbour __rcu **np, struct neigh_table *tbl) { bool retval = false; write_lock(&n->lock); - if (refcount_read(&n->refcnt) == 1 && !(n->nud_state & state)) { + if (refcount_read(&n->refcnt) == 1 && !(n->nud_state & state) && + !(n->flags & flags)) { struct neighbour *neigh; neigh = rcu_dereference_protected(n->next, @@ -157,7 +158,7 @@ bool neigh_remove_one(struct neighbour *ndel, struct neigh_table *tbl) while ((n = rcu_dereference_protected(*np, lockdep_is_held(&tbl->lock)))) { if (n == ndel) - return neigh_del(n, 0, np, tbl); + return neigh_del(n, 0, 0, np, tbl); np = &n->next; } return false; @@ -185,7 +186,8 @@ static int neigh_forced_gc(struct neigh_table *tbl) * - nobody refers to it. * - it is not permanent */ - if (neigh_del(n, NUD_PERMANENT, np, tbl)) { + if (neigh_del(n, NUD_PERMANENT, NTF_EXT_LEARNED, np, + tbl)) { shrunk = 1; continue; } diff --git a/net/core/sock.c b/net/core/sock.c index f333d75ef1a9..bcc41829a16d 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -728,22 +728,9 @@ int sock_setsockopt(struct socket *sock, int level, int optname, sock_valbool_flag(sk, SOCK_DBG, valbool); break; case SO_REUSEADDR: - val = (valbool ? SK_CAN_REUSE : SK_NO_REUSE); - if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) && - inet_sk(sk)->inet_num && - (sk->sk_reuse != val)) { - ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN; - break; - } - sk->sk_reuse = val; + sk->sk_reuse = (valbool ? SK_CAN_REUSE : SK_NO_REUSE); break; case SO_REUSEPORT: - if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) && - inet_sk(sk)->inet_num && - (sk->sk_reuseport != valbool)) { - ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN; - break; - } sk->sk_reuseport = valbool; break; case SO_TYPE: diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c index 7d20e1f3de28..56197f0d9608 100644 --- a/net/dsa/tag_trailer.c +++ b/net/dsa/tag_trailer.c @@ -75,7 +75,8 @@ static struct sk_buff *trailer_rcv(struct sk_buff *skb, struct net_device *dev, if (!skb->dev) return NULL; - pskb_trim_rcsum(skb, skb->len - 4); + if (pskb_trim_rcsum(skb, skb->len - 4)) + return NULL; return skb; } diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 38ab97b0a2ec..ca0dad90803a 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -531,6 +531,7 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name, return -ENOMEM; j = 0; + memset(&mtpar, 0, sizeof(mtpar)); mtpar.net = net; mtpar.table = name; mtpar.entryinfo = &e->ip; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index fed3f1c66167..bea17f1e8302 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1730,6 +1730,10 @@ process: reqsk_put(req); goto discard_it; } + if (tcp_checksum_complete(skb)) { + reqsk_put(req); + goto csum_error; + } if (unlikely(sk->sk_state != TCP_LISTEN)) { inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 4d58e2ce0b5b..8cc7c3487330 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -268,8 +268,6 @@ found: goto out_check_final; } - p = *head; - th2 = tcp_hdr(p); tcp_flag_word(th2) |= flags & (TCP_FLAG_FIN | TCP_FLAG_PSH); out_check_final: diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 89019bf59f46..c134286d6a41 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1324,6 +1324,7 @@ retry: } } + memset(&cfg, 0, sizeof(cfg)); cfg.valid_lft = min_t(__u32, ifp->valid_lft, idev->cnf.temp_valid_lft + age); cfg.preferred_lft = cnf_temp_preferred_lft + age - idev->desync_factor; @@ -1357,7 +1358,6 @@ retry: cfg.pfx = &addr; cfg.scope = ipv6_addr_scope(cfg.pfx); - cfg.rt_priority = 0; ift = ipv6_add_addr(idev, &cfg, block, NULL); if (IS_ERR(ift)) { diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 7aa4c41a3bd9..39d1d487eca2 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -934,6 +934,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, { struct fib6_info *leaf = rcu_dereference_protected(fn->leaf, lockdep_is_held(&rt->fib6_table->tb6_lock)); + enum fib_event_type event = FIB_EVENT_ENTRY_ADD; struct fib6_info *iter = NULL, *match = NULL; struct fib6_info __rcu **ins; int replace = (info->nlh && @@ -1013,6 +1014,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, "Can not append to a REJECT route"); return -EINVAL; } + event = FIB_EVENT_ENTRY_APPEND; rt->fib6_nsiblings = match->fib6_nsiblings; list_add_tail(&rt->fib6_siblings, &match->fib6_siblings); match->fib6_nsiblings++; @@ -1034,15 +1036,12 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, * insert node */ if (!replace) { - enum fib_event_type event; - if (!add) pr_warn("NLM_F_CREATE should be set when creating new route\n"); add: nlflags |= NLM_F_CREATE; - event = append ? FIB_EVENT_ENTRY_APPEND : FIB_EVENT_ENTRY_ADD; err = call_fib6_entry_notifiers(info->nl_net, event, rt, extack); if (err) diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 0758b5bcfb29..7eab959734bc 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -550,6 +550,7 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name, return -ENOMEM; j = 0; + memset(&mtpar, 0, sizeof(mtpar)); mtpar.net = net; mtpar.table = name; mtpar.entryinfo = &e->ipv6; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index fb956989adaf..86a0e4333d42 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2307,9 +2307,6 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk, const struct in6_addr *daddr, *saddr; struct rt6_info *rt6 = (struct rt6_info *)dst; - if (rt6->rt6i_flags & RTF_LOCAL) - return; - if (dst_metric_locked(dst, RTAX_MTU)) return; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index b620d9b72e59..7efa9fd7e109 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1479,6 +1479,10 @@ process: reqsk_put(req); goto discard_it; } + if (tcp_checksum_complete(skb)) { + reqsk_put(req); + goto csum_error; + } if (unlikely(sk->sk_state != TCP_LISTEN)) { inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c index 6616c9fd292f..5b9900889e31 100644 --- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -553,6 +553,12 @@ static int l2tp_nl_cmd_session_create(struct sk_buff *skb, struct genl_info *inf goto out_tunnel; } + /* L2TPv2 only accepts PPP pseudo-wires */ + if (tunnel->version == 2 && cfg.pw_type != L2TP_PWTYPE_PPP) { + ret = -EPROTONOSUPPORT; + goto out_tunnel; + } + if (tunnel->version > 2) { if (info->attrs[L2TP_ATTR_DATA_SEQ]) cfg.data_seq = nla_get_u8(info->attrs[L2TP_ATTR_DATA_SEQ]); diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index b56cb1df4fc0..55188382845c 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -612,6 +612,8 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, u32 session_id, peer_session_id; bool drop_refcnt = false; bool drop_tunnel = false; + bool new_session = false; + bool new_tunnel = false; int ver = 2; int fd; @@ -701,6 +703,15 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, .encap = L2TP_ENCAPTYPE_UDP, .debug = 0, }; + + /* Prevent l2tp_tunnel_register() from trying to set up + * a kernel socket. + */ + if (fd < 0) { + error = -EBADF; + goto end; + } + error = l2tp_tunnel_create(sock_net(sk), fd, ver, tunnel_id, peer_tunnel_id, &tcfg, &tunnel); if (error < 0) goto end; @@ -713,6 +724,7 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, goto end; } drop_tunnel = true; + new_tunnel = true; } } else { /* Error if we can't find the tunnel */ @@ -734,6 +746,12 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, session = l2tp_session_get(sock_net(sk), tunnel, session_id); if (session) { drop_refcnt = true; + + if (session->pwtype != L2TP_PWTYPE_PPP) { + error = -EPROTOTYPE; + goto end; + } + ps = l2tp_session_priv(session); /* Using a pre-existing session is fine as long as it hasn't @@ -751,6 +769,7 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, /* Default MTU must allow space for UDP/L2TP/PPP headers */ cfg.mtu = 1500 - PPPOL2TP_HEADER_OVERHEAD; cfg.mru = cfg.mtu; + cfg.pw_type = L2TP_PWTYPE_PPP; session = l2tp_session_create(sizeof(struct pppol2tp_session), tunnel, session_id, @@ -772,6 +791,7 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, goto end; } drop_refcnt = true; + new_session = true; } /* Special case: if source & dest session_id == 0x0000, this @@ -818,6 +838,12 @@ out_no_ppp: session->name); end: + if (error) { + if (new_session) + l2tp_session_delete(session); + if (new_tunnel) + l2tp_tunnel_delete(tunnel); + } if (drop_refcnt) l2tp_session_dec_refcount(session); if (drop_tunnel) @@ -1175,7 +1201,7 @@ static int pppol2tp_tunnel_ioctl(struct l2tp_tunnel *tunnel, l2tp_session_get(sock_net(sk), tunnel, stats.session_id); - if (session) { + if (session && session->pwtype == L2TP_PWTYPE_PPP) { err = pppol2tp_session_ioctl(session, cmd, arg); l2tp_session_dec_refcount(session); diff --git a/net/mac80211/main.c b/net/mac80211/main.c index fb1b1f9e7e5e..fb73451ed85e 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -1098,6 +1098,10 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) ieee80211_led_init(local); + result = ieee80211_txq_setup_flows(local); + if (result) + goto fail_flows; + rtnl_lock(); result = ieee80211_init_rate_ctrl_alg(local, @@ -1120,10 +1124,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) rtnl_unlock(); - result = ieee80211_txq_setup_flows(local); - if (result) - goto fail_flows; - #ifdef CONFIG_INET local->ifa_notifier.notifier_call = ieee80211_ifa_changed; result = register_inetaddr_notifier(&local->ifa_notifier); @@ -1149,8 +1149,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) #if defined(CONFIG_INET) || defined(CONFIG_IPV6) fail_ifa: #endif - ieee80211_txq_teardown_flows(local); - fail_flows: rtnl_lock(); rate_control_deinitialize(local); ieee80211_remove_interfaces(local); @@ -1158,6 +1156,8 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) rtnl_unlock(); ieee80211_led_exit(local); ieee80211_wep_free(local); + ieee80211_txq_teardown_flows(local); + fail_flows: destroy_workqueue(local->workqueue); fail_workqueue: wiphy_unregister(local->hw.wiphy); diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index bbad940c0137..8a33dac4e805 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -1234,7 +1234,10 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set, pr_debug("Create set %s with family %s\n", set->name, set->family == NFPROTO_IPV4 ? "inet" : "inet6"); -#ifndef IP_SET_PROTO_UNDEF +#ifdef IP_SET_PROTO_UNDEF + if (set->family != NFPROTO_UNSPEC) + return -IPSET_ERR_INVALID_FAMILY; +#else if (!(set->family == NFPROTO_IPV4 || set->family == NFPROTO_IPV6)) return -IPSET_ERR_INVALID_FAMILY; #endif diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 0c03c0e16a96..dd21782e2f12 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -839,6 +839,9 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest, * For now only for NAT! */ ip_vs_rs_hash(ipvs, dest); + /* FTP-NAT requires conntrack for mangling */ + if (svc->port == FTPPORT) + ip_vs_register_conntrack(svc); } atomic_set(&dest->conn_flags, conn_flags); @@ -1462,6 +1465,7 @@ static void __ip_vs_del_service(struct ip_vs_service *svc, bool cleanup) */ static void ip_vs_unlink_service(struct ip_vs_service *svc, bool cleanup) { + ip_vs_unregister_conntrack(svc); /* Hold svc to avoid double release from dest_trash */ atomic_inc(&svc->refcnt); /* diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index ba0a0fd045c8..473cce2a5231 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -168,7 +168,7 @@ static inline bool crosses_local_route_boundary(int skb_af, struct sk_buff *skb, bool new_rt_is_local) { bool rt_mode_allow_local = !!(rt_mode & IP_VS_RT_MODE_LOCAL); - bool rt_mode_allow_non_local = !!(rt_mode & IP_VS_RT_MODE_LOCAL); + bool rt_mode_allow_non_local = !!(rt_mode & IP_VS_RT_MODE_NON_LOCAL); bool rt_mode_allow_redirect = !!(rt_mode & IP_VS_RT_MODE_RDR); bool source_is_loopback; bool old_rt_is_local; diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c index 3b5059a8dcdd..d8383609fe28 100644 --- a/net/netfilter/nf_conncount.c +++ b/net/netfilter/nf_conncount.c @@ -46,6 +46,7 @@ struct nf_conncount_tuple { struct hlist_node node; struct nf_conntrack_tuple tuple; + struct nf_conntrack_zone zone; }; struct nf_conncount_rb { @@ -80,7 +81,8 @@ static int key_diff(const u32 *a, const u32 *b, unsigned int klen) } bool nf_conncount_add(struct hlist_head *head, - const struct nf_conntrack_tuple *tuple) + const struct nf_conntrack_tuple *tuple, + const struct nf_conntrack_zone *zone) { struct nf_conncount_tuple *conn; @@ -88,6 +90,7 @@ bool nf_conncount_add(struct hlist_head *head, if (conn == NULL) return false; conn->tuple = *tuple; + conn->zone = *zone; hlist_add_head(&conn->node, head); return true; } @@ -108,7 +111,7 @@ unsigned int nf_conncount_lookup(struct net *net, struct hlist_head *head, /* check the saved connections */ hlist_for_each_entry_safe(conn, n, head, node) { - found = nf_conntrack_find_get(net, zone, &conn->tuple); + found = nf_conntrack_find_get(net, &conn->zone, &conn->tuple); if (found == NULL) { hlist_del(&conn->node); kmem_cache_free(conncount_conn_cachep, conn); @@ -117,7 +120,8 @@ unsigned int nf_conncount_lookup(struct net *net, struct hlist_head *head, found_ct = nf_ct_tuplehash_to_ctrack(found); - if (tuple && nf_ct_tuple_equal(&conn->tuple, tuple)) { + if (tuple && nf_ct_tuple_equal(&conn->tuple, tuple) && + nf_ct_zone_equal(found_ct, zone, zone->dir)) { /* * Just to be sure we have it only once in the list. * We should not see tuples twice unless someone hooks @@ -196,7 +200,7 @@ count_tree(struct net *net, struct rb_root *root, if (!addit) return count; - if (!nf_conncount_add(&rbconn->hhead, tuple)) + if (!nf_conncount_add(&rbconn->hhead, tuple, zone)) return 0; /* hotdrop */ return count + 1; @@ -238,6 +242,7 @@ count_tree(struct net *net, struct rb_root *root, } conn->tuple = *tuple; + conn->zone = *zone; memcpy(rbconn->key, key, sizeof(u32) * keylen); INIT_HLIST_HEAD(&rbconn->hhead); diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 39327a42879f..20a2e37c76d1 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1446,7 +1446,8 @@ ctnetlink_parse_nat_setup(struct nf_conn *ct, } nfnl_lock(NFNL_SUBSYS_CTNETLINK); rcu_read_lock(); - if (nat_hook->parse_nat_setup) + nat_hook = rcu_dereference(nf_nat_hook); + if (nat_hook) return -EAGAIN; #endif return -EOPNOTSUPP; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f0411fbffe77..896d4a36081d 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2890,12 +2890,13 @@ static struct nft_set *nft_set_lookup_byid(const struct net *net, u32 id = ntohl(nla_get_be32(nla)); list_for_each_entry(trans, &net->nft.commit_list, list) { - struct nft_set *set = nft_trans_set(trans); + if (trans->msg_type == NFT_MSG_NEWSET) { + struct nft_set *set = nft_trans_set(trans); - if (trans->msg_type == NFT_MSG_NEWSET && - id == nft_trans_set_id(trans) && - nft_active_genmask(set, genmask)) - return set; + if (id == nft_trans_set_id(trans) && + nft_active_genmask(set, genmask)) + return set; + } } return ERR_PTR(-ENOENT); } @@ -5836,18 +5837,23 @@ static int nf_tables_flowtable_event(struct notifier_block *this, struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct nft_flowtable *flowtable; struct nft_table *table; + struct net *net; if (event != NETDEV_UNREGISTER) return 0; + net = maybe_get_net(dev_net(dev)); + if (!net) + return 0; + nfnl_lock(NFNL_SUBSYS_NFTABLES); - list_for_each_entry(table, &dev_net(dev)->nft.tables, list) { + list_for_each_entry(table, &net->nft.tables, list) { list_for_each_entry(flowtable, &table->flowtables, list) { nft_flowtable_event(event, dev, flowtable); } } nfnl_unlock(NFNL_SUBSYS_NFTABLES); - + put_net(net); return NOTIFY_DONE; } @@ -6438,7 +6444,7 @@ static void nf_tables_abort_release(struct nft_trans *trans) kfree(trans); } -static int nf_tables_abort(struct net *net, struct sk_buff *skb) +static int __nf_tables_abort(struct net *net) { struct nft_trans *trans, *next; struct nft_trans_elem *te; @@ -6554,6 +6560,11 @@ static void nf_tables_cleanup(struct net *net) nft_validate_state_update(net, NFT_VALIDATE_SKIP); } +static int nf_tables_abort(struct net *net, struct sk_buff *skb) +{ + return __nf_tables_abort(net); +} + static bool nf_tables_valid_genid(struct net *net, u32 genid) { return net->nft.base_seq == genid; @@ -7148,9 +7159,12 @@ static int __net_init nf_tables_init_net(struct net *net) static void __net_exit nf_tables_exit_net(struct net *net) { + nfnl_lock(NFNL_SUBSYS_NFTABLES); + if (!list_empty(&net->nft.commit_list)) + __nf_tables_abort(net); __nft_release_tables(net); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); WARN_ON_ONCE(!list_empty(&net->nft.tables)); - WARN_ON_ONCE(!list_empty(&net->nft.commit_list)); } static struct pernet_operations nf_tables_net_ops = { @@ -7192,13 +7206,13 @@ err1: static void __exit nf_tables_module_exit(void) { - unregister_pernet_subsys(&nf_tables_net_ops); nfnetlink_subsys_unregister(&nf_tables_subsys); unregister_netdevice_notifier(&nf_tables_flowtable_notifier); + nft_chain_filter_fini(); + unregister_pernet_subsys(&nf_tables_net_ops); rcu_barrier(); nf_tables_core_module_exit(); kfree(info); - nft_chain_filter_fini(); } module_init(nf_tables_module_init); diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index deff10adef9c..8de912ca53d3 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -183,7 +183,8 @@ next_rule: switch (regs.verdict.code) { case NFT_JUMP: - BUG_ON(stackptr >= NFT_JUMP_STACK_SIZE); + if (WARN_ON_ONCE(stackptr >= NFT_JUMP_STACK_SIZE)) + return NF_DROP; jumpstack[stackptr].chain = chain; jumpstack[stackptr].rules = rules + 1; stackptr++; diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index 4d0da7042aff..e1b6be29848d 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -429,7 +429,7 @@ replay: */ if (err == -EAGAIN) { status |= NFNL_BATCH_REPLAY; - goto next; + goto done; } } ack: @@ -456,7 +456,7 @@ ack: if (err) status |= NFNL_BATCH_FAILURE; } -next: + msglen = NLMSG_ALIGN(nlh->nlmsg_len); if (msglen > skb->len) msglen = skb->len; @@ -464,7 +464,11 @@ next: } done: if (status & NFNL_BATCH_REPLAY) { - ss->abort(net, oskb); + const struct nfnetlink_subsystem *ss2; + + ss2 = nfnl_dereference_protected(subsys_id); + if (ss2 == ss) + ss->abort(net, oskb); nfnl_err_reset(&err_list); nfnl_unlock(subsys_id); kfree_skb(skb); diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c index 84c902477a91..d21834bed805 100644 --- a/net/netfilter/nft_chain_filter.c +++ b/net/netfilter/nft_chain_filter.c @@ -318,6 +318,10 @@ static int nf_tables_netdev_event(struct notifier_block *this, event != NETDEV_CHANGENAME) return NOTIFY_DONE; + ctx.net = maybe_get_net(ctx.net); + if (!ctx.net) + return NOTIFY_DONE; + nfnl_lock(NFNL_SUBSYS_NFTABLES); list_for_each_entry(table, &ctx.net->nft.tables, list) { if (table->family != NFPROTO_NETDEV) @@ -334,6 +338,7 @@ static int nf_tables_netdev_event(struct notifier_block *this, } } nfnl_unlock(NFNL_SUBSYS_NFTABLES); + put_net(ctx.net); return NOTIFY_DONE; } diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c index 50c068d660e5..a832c59f0a9c 100644 --- a/net/netfilter/nft_connlimit.c +++ b/net/netfilter/nft_connlimit.c @@ -52,7 +52,7 @@ static inline void nft_connlimit_do_eval(struct nft_connlimit *priv, if (!addit) goto out; - if (!nf_conncount_add(&priv->hhead, tuple_ptr)) { + if (!nf_conncount_add(&priv->hhead, tuple_ptr, zone)) { regs->verdict.code = NF_DROP; spin_unlock_bh(&priv->lock); return; diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c index 4d49529cff61..27d7e4598ab6 100644 --- a/net/netfilter/nft_dynset.c +++ b/net/netfilter/nft_dynset.c @@ -203,9 +203,7 @@ static int nft_dynset_init(const struct nft_ctx *ctx, goto err1; set->ops->gc_init(set); } - - } else if (set->flags & NFT_SET_EVAL) - return -EINVAL; + } nft_set_ext_prepare(&priv->tmpl); nft_set_ext_add_length(&priv->tmpl, NFT_SET_EXT_KEY, set->klen); diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index d260ce2d6671..7f3a9a211034 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -66,7 +66,7 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set parent = rcu_dereference_raw(parent->rb_left); if (interval && nft_rbtree_equal(set, this, interval) && - nft_rbtree_interval_end(this) && + nft_rbtree_interval_end(rbe) && !nft_rbtree_interval_end(interval)) continue; interval = rbe; diff --git a/net/netfilter/nft_socket.c b/net/netfilter/nft_socket.c index f28a0b944087..74e1b3bd6954 100644 --- a/net/netfilter/nft_socket.c +++ b/net/netfilter/nft_socket.c @@ -142,3 +142,4 @@ module_exit(nft_socket_module_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Máté Eckl"); MODULE_DESCRIPTION("nf_tables socket match module"); +MODULE_ALIAS_NFT_EXPR("socket"); diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c index 8790190c6feb..03b9a50ec93b 100644 --- a/net/netfilter/xt_CT.c +++ b/net/netfilter/xt_CT.c @@ -245,12 +245,22 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par, } if (info->helper[0]) { + if (strnlen(info->helper, sizeof(info->helper)) == sizeof(info->helper)) { + ret = -ENAMETOOLONG; + goto err3; + } + ret = xt_ct_set_helper(ct, info->helper, par); if (ret < 0) goto err3; } if (info->timeout[0]) { + if (strnlen(info->timeout, sizeof(info->timeout)) == sizeof(info->timeout)) { + ret = -ENAMETOOLONG; + goto err4; + } + ret = xt_ct_set_timeout(ct, par, info->timeout); if (ret < 0) goto err4; diff --git a/net/netfilter/xt_connmark.c b/net/netfilter/xt_connmark.c index 94df000abb92..29c38aa7f726 100644 --- a/net/netfilter/xt_connmark.c +++ b/net/netfilter/xt_connmark.c @@ -211,7 +211,7 @@ static int __init connmark_mt_init(void) static void __exit connmark_mt_exit(void) { xt_unregister_match(&connmark_mt_reg); - xt_unregister_target(connmark_tg_reg); + xt_unregister_targets(connmark_tg_reg, ARRAY_SIZE(connmark_tg_reg)); } module_init(connmark_mt_init); diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c index 6f4c5217d835..bf2890b13212 100644 --- a/net/netfilter/xt_set.c +++ b/net/netfilter/xt_set.c @@ -372,8 +372,8 @@ set_target_v2(struct sk_buff *skb, const struct xt_action_param *par) /* Normalize to fit into jiffies */ if (add_opt.ext.timeout != IPSET_NO_TIMEOUT && - add_opt.ext.timeout > UINT_MAX / MSEC_PER_SEC) - add_opt.ext.timeout = UINT_MAX / MSEC_PER_SEC; + add_opt.ext.timeout > IPSET_MAX_TIMEOUT) + add_opt.ext.timeout = IPSET_MAX_TIMEOUT; if (info->add_set.index != IPSET_INVALID_ID) ip_set_add(info->add_set.index, skb, par, &add_opt); if (info->del_set.index != IPSET_INVALID_ID) @@ -407,8 +407,8 @@ set_target_v3(struct sk_buff *skb, const struct xt_action_param *par) /* Normalize to fit into jiffies */ if (add_opt.ext.timeout != IPSET_NO_TIMEOUT && - add_opt.ext.timeout > UINT_MAX / MSEC_PER_SEC) - add_opt.ext.timeout = UINT_MAX / MSEC_PER_SEC; + add_opt.ext.timeout > IPSET_MAX_TIMEOUT) + add_opt.ext.timeout = IPSET_MAX_TIMEOUT; if (info->add_set.index != IPSET_INVALID_ID) ip_set_add(info->add_set.index, skb, par, &add_opt); if (info->del_set.index != IPSET_INVALID_ID) @@ -470,7 +470,7 @@ set_target_v3_checkentry(const struct xt_tgchk_param *par) } if (((info->flags & IPSET_FLAG_MAP_SKBPRIO) | (info->flags & IPSET_FLAG_MAP_SKBQUEUE)) && - !(par->hook_mask & (1 << NF_INET_FORWARD | + (par->hook_mask & ~(1 << NF_INET_FORWARD | 1 << NF_INET_LOCAL_OUT | 1 << NF_INET_POST_ROUTING))) { pr_info_ratelimited("mapping of prio or/and queue is allowed only from OUTPUT/FORWARD/POSTROUTING chains\n"); diff --git a/net/rds/loop.c b/net/rds/loop.c index f2bf78de5688..dac6218a460e 100644 --- a/net/rds/loop.c +++ b/net/rds/loop.c @@ -193,4 +193,5 @@ struct rds_transport rds_loop_transport = { .inc_copy_to_user = rds_message_inc_copy_to_user, .inc_free = rds_loop_inc_free, .t_name = "loopback", + .t_type = RDS_TRANS_LOOP, }; diff --git a/net/rds/rds.h b/net/rds/rds.h index b04c333d9d1c..f2272fb8cd45 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -479,6 +479,11 @@ struct rds_notifier { int n_status; }; +/* Available as part of RDS core, so doesn't need to participate + * in get_preferred transport etc + */ +#define RDS_TRANS_LOOP 3 + /** * struct rds_transport - transport specific behavioural hooks * diff --git a/net/rds/recv.c b/net/rds/recv.c index dc67458b52f0..192ac6f78ded 100644 --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -103,6 +103,11 @@ static void rds_recv_rcvbuf_delta(struct rds_sock *rs, struct sock *sk, rds_stats_add(s_recv_bytes_added_to_socket, delta); else rds_stats_add(s_recv_bytes_removed_from_socket, -delta); + + /* loop transport doesn't send/recv congestion updates */ + if (rs->rs_transport->t_type == RDS_TRANS_LOOP) + return; + now_congested = rs->rs_rcv_bytes > rds_sk_rcvbuf(rs); rdsdebug("rs %p (%pI4:%u) recv bytes %d buf %d " diff --git a/net/sctp/output.c b/net/sctp/output.c index e672dee302c7..7f849b01ec8e 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -409,6 +409,21 @@ static void sctp_packet_set_owner_w(struct sk_buff *skb, struct sock *sk) refcount_inc(&sk->sk_wmem_alloc); } +static void sctp_packet_gso_append(struct sk_buff *head, struct sk_buff *skb) +{ + if (SCTP_OUTPUT_CB(head)->last == head) + skb_shinfo(head)->frag_list = skb; + else + SCTP_OUTPUT_CB(head)->last->next = skb; + SCTP_OUTPUT_CB(head)->last = skb; + + head->truesize += skb->truesize; + head->data_len += skb->len; + head->len += skb->len; + + __skb_header_release(skb); +} + static int sctp_packet_pack(struct sctp_packet *packet, struct sk_buff *head, int gso, gfp_t gfp) { @@ -422,7 +437,7 @@ static int sctp_packet_pack(struct sctp_packet *packet, if (gso) { skb_shinfo(head)->gso_type = sk->sk_gso_type; - NAPI_GRO_CB(head)->last = head; + SCTP_OUTPUT_CB(head)->last = head; } else { nskb = head; pkt_size = packet->size; @@ -503,15 +518,8 @@ merge: &packet->chunk_list); } - if (gso) { - if (skb_gro_receive(&head, nskb)) { - kfree_skb(nskb); - return 0; - } - if (WARN_ON_ONCE(skb_shinfo(head)->gso_segs >= - sk->sk_gso_max_segs)) - return 0; - } + if (gso) + sctp_packet_gso_append(head, nskb); pkt_count++; } while (!list_empty(&packet->chunk_list)); diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 973b4471b532..da7f02edcd37 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -1273,8 +1273,7 @@ static __poll_t smc_accept_poll(struct sock *parent) return mask; } -static __poll_t smc_poll(struct file *file, struct socket *sock, - poll_table *wait) +static __poll_t smc_poll_mask(struct socket *sock, __poll_t events) { struct sock *sk = sock->sk; __poll_t mask = 0; @@ -1290,7 +1289,7 @@ static __poll_t smc_poll(struct file *file, struct socket *sock, if ((sk->sk_state == SMC_INIT) || smc->use_fallback) { /* delegate to CLC child sock */ release_sock(sk); - mask = smc->clcsock->ops->poll(file, smc->clcsock, wait); + mask = smc->clcsock->ops->poll_mask(smc->clcsock, events); lock_sock(sk); sk->sk_err = smc->clcsock->sk->sk_err; if (sk->sk_err) { @@ -1308,11 +1307,6 @@ static __poll_t smc_poll(struct file *file, struct socket *sock, } } } else { - if (sk->sk_state != SMC_CLOSED) { - release_sock(sk); - sock_poll_wait(file, sk_sleep(sk), wait); - lock_sock(sk); - } if (sk->sk_err) mask |= EPOLLERR; if ((sk->sk_shutdown == SHUTDOWN_MASK) || @@ -1625,7 +1619,7 @@ static const struct proto_ops smc_sock_ops = { .socketpair = sock_no_socketpair, .accept = smc_accept, .getname = smc_getname, - .poll = smc_poll, + .poll_mask = smc_poll_mask, .ioctl = smc_ioctl, .listen = smc_listen, .shutdown = smc_shutdown, diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 301f22430469..a127d61e8af9 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -712,7 +712,7 @@ static int __init tls_register(void) build_protos(tls_prots[TLSV4], &tcp_prot); tls_sw_proto_ops = inet_stream_ops; - tls_sw_proto_ops.poll = tls_sw_poll; + tls_sw_proto_ops.poll_mask = tls_sw_poll_mask; tls_sw_proto_ops.splice_read = tls_sw_splice_read; #ifdef CONFIG_TLS_DEVICE diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 8ca57d01b18f..f127fac88acf 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -191,18 +191,12 @@ static void tls_free_both_sg(struct sock *sk) } static int tls_do_encryption(struct tls_context *tls_ctx, - struct tls_sw_context_tx *ctx, size_t data_len, - gfp_t flags) + struct tls_sw_context_tx *ctx, + struct aead_request *aead_req, + size_t data_len) { - unsigned int req_size = sizeof(struct aead_request) + - crypto_aead_reqsize(ctx->aead_send); - struct aead_request *aead_req; int rc; - aead_req = kzalloc(req_size, flags); - if (!aead_req) - return -ENOMEM; - ctx->sg_encrypted_data[0].offset += tls_ctx->tx.prepend_size; ctx->sg_encrypted_data[0].length -= tls_ctx->tx.prepend_size; @@ -219,7 +213,6 @@ static int tls_do_encryption(struct tls_context *tls_ctx, ctx->sg_encrypted_data[0].offset -= tls_ctx->tx.prepend_size; ctx->sg_encrypted_data[0].length += tls_ctx->tx.prepend_size; - kfree(aead_req); return rc; } @@ -228,8 +221,14 @@ static int tls_push_record(struct sock *sk, int flags, { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx); + struct aead_request *req; int rc; + req = kzalloc(sizeof(struct aead_request) + + crypto_aead_reqsize(ctx->aead_send), sk->sk_allocation); + if (!req) + return -ENOMEM; + sg_mark_end(ctx->sg_plaintext_data + ctx->sg_plaintext_num_elem - 1); sg_mark_end(ctx->sg_encrypted_data + ctx->sg_encrypted_num_elem - 1); @@ -245,15 +244,14 @@ static int tls_push_record(struct sock *sk, int flags, tls_ctx->pending_open_record_frags = 0; set_bit(TLS_PENDING_CLOSED_RECORD, &tls_ctx->flags); - rc = tls_do_encryption(tls_ctx, ctx, ctx->sg_plaintext_size, - sk->sk_allocation); + rc = tls_do_encryption(tls_ctx, ctx, req, ctx->sg_plaintext_size); if (rc < 0) { /* If we are called from write_space and * we fail, we need to set this SOCK_NOSPACE * to trigger another write_space in the future. */ set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); - return rc; + goto out_req; } free_sg(sk, ctx->sg_plaintext_data, &ctx->sg_plaintext_num_elem, @@ -268,6 +266,8 @@ static int tls_push_record(struct sock *sk, int flags, tls_err_abort(sk, EBADMSG); tls_advance_record_sn(sk, &tls_ctx->tx); +out_req: + kfree(req); return rc; } @@ -754,7 +754,7 @@ int tls_sw_recvmsg(struct sock *sk, struct sk_buff *skb; ssize_t copied = 0; bool cmsg = false; - int err = 0; + int target, err = 0; long timeo; flags |= nonblock; @@ -764,6 +764,7 @@ int tls_sw_recvmsg(struct sock *sk, lock_sock(sk); + target = sock_rcvlowat(sk, flags & MSG_WAITALL, len); timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); do { bool zc = false; @@ -856,6 +857,9 @@ fallback_to_reg_recv: goto recv_end; } } + /* If we have a new message from strparser, continue now. */ + if (copied >= target && !ctx->recv_pkt) + break; } while (len); recv_end: @@ -915,23 +919,22 @@ splice_read_end: return copied ? : err; } -unsigned int tls_sw_poll(struct file *file, struct socket *sock, - struct poll_table_struct *wait) +__poll_t tls_sw_poll_mask(struct socket *sock, __poll_t events) { - unsigned int ret; struct sock *sk = sock->sk; struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); + __poll_t mask; - /* Grab POLLOUT and POLLHUP from the underlying socket */ - ret = ctx->sk_poll(file, sock, wait); + /* Grab EPOLLOUT and EPOLLHUP from the underlying socket */ + mask = ctx->sk_poll_mask(sock, events); - /* Clear POLLIN bits, and set based on recv_pkt */ - ret &= ~(POLLIN | POLLRDNORM); + /* Clear EPOLLIN bits, and set based on recv_pkt */ + mask &= ~(EPOLLIN | EPOLLRDNORM); if (ctx->recv_pkt) - ret |= POLLIN | POLLRDNORM; + mask |= EPOLLIN | EPOLLRDNORM; - return ret; + return mask; } static int tls_read_size(struct strparser *strp, struct sk_buff *skb) @@ -1188,7 +1191,7 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx) sk->sk_data_ready = tls_data_ready; write_unlock_bh(&sk->sk_callback_lock); - sw_ctx_rx->sk_poll = sk->sk_socket->ops->poll; + sw_ctx_rx->sk_poll_mask = sk->sk_socket->ops->poll_mask; strp_check_rcv(&sw_ctx_rx->strp); } diff --git a/net/wireless/core.c b/net/wireless/core.c index 5fe35aafdd9c..48e8097339ab 100644 --- a/net/wireless/core.c +++ b/net/wireless/core.c @@ -1012,6 +1012,7 @@ void cfg80211_unregister_wdev(struct wireless_dev *wdev) nl80211_notify_iface(rdev, wdev, NL80211_CMD_DEL_INTERFACE); list_del_rcu(&wdev->list); + synchronize_rcu(); rdev->devlist_generation++; switch (wdev->iftype) { diff --git a/net/wireless/util.c b/net/wireless/util.c index b5bb1c309914..3c654cd7ba56 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -1746,6 +1746,8 @@ int cfg80211_get_station(struct net_device *dev, const u8 *mac_addr, if (!rdev->ops->get_station) return -EOPNOTSUPP; + memset(sinfo, 0, sizeof(*sinfo)); + return rdev_get_station(rdev, dev, mac_addr, sinfo); } EXPORT_SYMBOL(cfg80211_get_station); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index b9ef487c4618..f47abb46c587 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -204,7 +204,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem) long npgs; int err; - umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL); + umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), + GFP_KERNEL | __GFP_NOWARN); if (!umem->pgs) return -ENOMEM; diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 607ed8729c06..7a6214e9ae58 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -16,9 +16,7 @@ LDLIBS += -lcap -lelf -lrt -lpthread TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read all: $(TEST_CUSTOM_PROGS) -$(TEST_CUSTOM_PROGS): urandom_read - -urandom_read: urandom_read.c +$(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c $(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id # Order correspond to 'make run_tests' order diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json b/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json index de97e4ff705c..637ea0219617 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json @@ -568,7 +568,7 @@ "matchPattern": "action order [0-9]*: ife encode action pass.*type 0xED3E.*use tcindex 65535.*index 1", "matchCount": "1", "teardown": [ - "$TC actions flush action skbedit" + "$TC actions flush action ife" ] }, { |