aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm/svm
AgeCommit message (Collapse)Author
2021-10-21KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATAMasahiro Kozuka
Flush the destination page before invoking RECEIVE_UPDATE_DATA, as the PSP encrypts the data with the guest's key when writing to guest memory. If the target memory was not previously encrypted, the cache may contain dirty, unecrypted data that will persist on non-coherent systems. Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command") Cc: stable@vger.kernel.org Cc: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Masahiro Kozuka <masa.koz@kozuka.jp> [sean: converted bug report to changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210914210951.2994260-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: SEV-ES: reduce ghcb_sa_len to 32 bitsPaolo Bonzini
The size of the GHCB scratch area is limited to 16 KiB (GHCB_SCRATCH_AREA_LIMIT), so there is no need for it to be a u64. This fixes a build error on 32-bit systems: i686-linux-gnu-ld: arch/x86/kvm/svm/sev.o: in function `sev_es_string_io: sev.c:(.text+0x110f): undefined reference to `__udivdi3' Cc: stable@vger.kernel.org Fixes: 019057bd73d1 ("KVM: SEV-ES: fix length of string I/O") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: SEV-ES: Set guest_state_protected after VMSA updatePeter Gonda
The refactoring in commit bb18a6777465 ("KVM: SEV: Acquire vcpu mutex when updating VMSA") left behind the assignment to svm->vcpu.arch.guest_state_protected; add it back. Signed-off-by: Peter Gonda <pgonda@google.com> [Delta between v2 and v3 of Peter's patch, which had already been committed; the commit message is my own. - Paolo] Fixes: bb18a6777465 ("KVM: SEV: Acquire vcpu mutex when updating VMSA") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-15KVM: SEV-ES: fix length of string I/OPaolo Bonzini
The size of the data in the scratch buffer is not divided by the size of each port I/O operation, so vcpu->arch.pio.count ends up being larger than it should be by a factor of size. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-23KVM: x86: nSVM: don't copy virt_ext from vmcb12Maxim Levitsky
These field correspond to features that we don't expose yet to L2 While currently there are no CVE worthy features in this field, if AMD adds more features to this field, that could allow guest escapes similar to CVE-2021-3653 and CVE-2021-3656. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-6-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-23KVM: x86: nSVM: test eax for 4K alignment for GP errata workaroundMaxim Levitsky
GP SVM errata workaround made the #GP handler always emulate the SVM instructions. However these instructions #GP in case the operand is not 4K aligned, but the workaround code didn't check this and we ended up emulating these instructions anyway. This is only an emulation accuracy check bug as there is no harm for KVM to read/write unaligned vmcb images. Fixes: 82a11e9c6fa2 ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-23KVM: x86: nSVM: restore int_vector in svm_clear_vintrMaxim Levitsky
In svm_clear_vintr we try to restore the virtual interrupt injection that might be pending, but we fail to restore the interrupt vector. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smmMaxim Levitsky
Use return statements instead of nested if, and fix error path to free all the maps that were allocated. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM modeMaxim Levitsky
Currently the KVM_REQ_GET_NESTED_STATE_PAGES on SVM only reloads PDPTRs, and MSR bitmap, with former not really needed for SMM as SMM exit code reloads them again from SMRAM'S CR3, and later happens to work since MSR bitmap isn't modified while in SMM. Still it is better to be consistient with VMX. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on ↵Maxim Levitsky
SMM exit Otherwise guest entry code might see incorrect L1 state (e.g paging state). Fixes: 37be407b2ce8 ("KVM: nSVM: Fix L1 state corruption upon return from SMM") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: SEV: Allow some commands for mirror VMPeter Gonda
A mirrored SEV-ES VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to setup its vCPUs and have them measured, and their VMSAs encrypted. Without this change, it is impossible to have mirror VMs as part of SEV-ES VMs. Also allow the guest status check and debugging commands since they do not change any guest state. Signed-off-by: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Nathan Tempelman <natet@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steve Rutherford <srutherford@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21) Message-Id: <20210921150345.2221634-3-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: SEV: Update svm_vm_copy_asid_from for SEV-ESPeter Gonda
For mirroring SEV-ES the mirror VM will need more then just the ASID. The FD and the handle are required to all the mirror to call psp commands. The mirror VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to setup its vCPUs' VMSAs for SEV-ES. Signed-off-by: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Nathan Tempelman <natet@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steve Rutherford <srutherford@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21) Message-Id: <20210921150345.2221634-2-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: SEV: Pin guest memory for write for RECEIVE_UPDATE_DATASean Christopherson
Require the target guest page to be writable when pinning memory for RECEIVE_UPDATE_DATA. Per the SEV API, the PSP writes to guest memory: The result is then encrypted with GCTX.VEK and written to the memory pointed to by GUEST_PADDR field. Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command") Cc: stable@vger.kernel.org Cc: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210914210951.2994260-2-seanjc@google.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Peter Gonda <pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: SVM: fix missing sev_decommission in sev_receive_startMingwei Zhang
DECOMMISSION the current SEV context if binding an ASID fails after RECEIVE_START. Per AMD's SEV API, RECEIVE_START generates a new guest context and thus needs to be paired with DECOMMISSION: The RECEIVE_START command is the only command other than the LAUNCH_START command that generates a new guest context and guest handle. The missing DECOMMISSION can result in subsequent SEV launch failures, as the firmware leaks memory and might not able to allocate more SEV guest contexts in the future. Note, LAUNCH_START suffered the same bug, but was previously fixed by commit 934002cd660b ("KVM: SVM: Call SEV Guest Decommission if ASID binding fails"). Cc: Alper Gun <alpergun@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: David Rienjes <rientjes@google.com> Cc: Marc Orr <marcorr@google.com> Cc: John Allen <john.allen@amd.com> Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vipin Sharma <vipinsh@google.com> Cc: stable@vger.kernel.org Reviewed-by: Marc Orr <marcorr@google.com> Acked-by: Brijesh Singh <brijesh.singh@amd.com> Fixes: af43cbbf954b ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command") Signed-off-by: Mingwei Zhang <mizhang@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210912181815.3899316-1-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: SEV: Acquire vcpu mutex when updating VMSAPeter Gonda
The update-VMSA ioctl touches data stored in struct kvm_vcpu, and therefore should not be performed concurrently with any VCPU ioctl that might cause KVM or the processor to use the same data. Adds vcpu mutex guard to the VMSA updating code. Refactors out __sev_launch_update_vmsa() function to deal with per vCPU parts of sev_launch_update_vmsa(). Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Signed-off-by: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20210915171755.3773766-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups s390: - enable interpretation of specification exceptions - fix a vcpu_idx vs vcpu_id mixup x86: - fast (lockless) page fault support for the new MMU - new MMU now the default - increased maximum allowed VCPU count - allow inhibit IRQs on KVM_RUN while debugging guests - let Hyper-V-enabled guests run with virtualized LAPIC as long as they do not enable the Hyper-V "AutoEOI" feature - fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC) - tuning for the case when two-dimensional paging (EPT/NPT) is disabled - bugfixes and cleanups, especially with respect to vCPU reset and choosing a paging mode based on CR0/CR4/EFER - support for 5-level page table on AMD processors Generic: - MMU notifier invalidation callbacks do not take mmu_lock unless necessary - improved caching of LRU kvm_memory_slot - support for histogram statistics - add statistics for halt polling and remote TLB flush requests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits) KVM: Drop unused kvm_dirty_gfn_invalid() KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted KVM: MMU: mark role_regs and role accessors as maybe unused KVM: MIPS: Remove a "set but not used" variable x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait KVM: stats: Add VM stat for remote tlb flush requests KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 kvm: x86: Increase MAX_VCPUS to 1024 kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host KVM: s390: index kvm->arch.idle_mask by vcpu_idx KVM: s390: Enable specification exception interpretation KVM: arm64: Trim guest debug exception handling KVM: SVM: Add 5-level page table support for SVM ...
2021-08-20KVM: SVM: Add 5-level page table support for SVMWei Huang
When the 5-level page table is enabled on host OS, the nested page table for guest VMs must use 5-level as well. Update get_npt_level() function to reflect this requirement. In the meanwhile, remove the code that prevents kvm-amd driver from being loaded when 5-level page table is detected. Signed-off-by: Wei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-4-wei.huang2@amd.com> [Tweak condition as suggested by Sean. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: Allow CPU to force vendor-specific TDP levelWei Huang
AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set. To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force a fixed TDP level. Signed-off-by: Wei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-2-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: split svm_handle_invalid_exitMaxim Levitsky
Split the check for having a vmexit handler to svm_check_exit_valid, and make svm_handle_invalid_exit only handle a vmexit that is already not valid. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210811122927.900604-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: AVIC: drop unsupported AVIC base relocation codeMaxim Levitsky
APIC base relocation is not supported anyway and won't work correctly so just drop the code that handles it and keep AVIC MMIO bar at the default APIC base. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-17-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVICMaxim Levitsky
Currently it is possible to have the following scenario: 1. AVIC is disabled by svm_refresh_apicv_exec_ctrl 2. svm_vcpu_blocking calls avic_vcpu_put which does nothing 3. svm_vcpu_unblocking enables the AVIC (due to KVM_REQ_APICV_UPDATE) and then calls avic_vcpu_load 4. warning is triggered in avic_vcpu_load since AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK was never cleared While it is possible to just remove the warning, it seems to be more robust to fully disable/enable AVIC in svm_refresh_apicv_exec_ctrl by calling the avic_vcpu_load/avic_vcpu_put Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-16-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: move check for kvm_vcpu_apicv_active outside of avic_vcpu_{put|load}Maxim Levitsky
No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-15-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: remove svm_toggle_avic_for_irq_windowMaxim Levitsky
Now that kvm_request_apicv_update doesn't need to drop the kvm->srcu lock, we can call kvm_request_apicv_update directly. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210810205251.424103-13-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: add warning for mistmatch between AVIC vcpu state and AVIC inhibitionMaxim Levitsky
It is never a good idea to enter a guest on a vCPU when the AVIC inhibition state doesn't match the enablement of the AVIC on the vCPU. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-11-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: don't disable APICv memslot when inhibitedMaxim Levitsky
Thanks to the former patches, it is now possible to keep the APICv memslot always enabled, and it will be invisible to the guest when it is inhibited This code is based on a suggestion from Sean Christopherson: https://lkml.org/lkml/2021/7/19/2970 Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-9-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-16KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656)Maxim Levitsky
If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor), then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only possible by making L0 intercept these instructions. Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted, and thus read/write portions of the host physical memory. Fixes: 89c8a4984fc9 ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-16KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)Maxim Levitsky
* Invert the mask of bits that we pick from L2 in nested_vmcb02_prepare_control * Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr This fixes a security issue that allowed a malicious L1 to run L2 with AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled AVIC to read/write the host physical memory at some offsets. Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86: Move declaration of kvm_spurious_fault() to x86.hUros Bizjak
Move the declaration of kvm_spurious_fault() to KVM's "private" x86.h, it should never be called by anything other than low level KVM code. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> [sean: rebased to a series without __ex()/__kvm_handle_fault_on_reboot()] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210809173955.1710866-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86: Kill off __ex() and __kvm_handle_fault_on_reboot()Sean Christopherson
Remove the __kvm_handle_fault_on_reboot() and __ex() macros now that all VMX and SVM instructions use asm goto to handle the fault (or in the case of VMREAD, completely custom logic). Drop kvm_spurious_fault()'s asmlinkage annotation as __kvm_handle_fault_on_reboot() was the only flow that invoked it from assembly code. Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210809173955.1710866-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-10Merge branch 'kvm-vmx-secctl' into HEADPaolo Bonzini
Merge common topic branch for 5.14-rc6 and 5.15 merge window.
2021-08-04KVM: SVM: improve the code readability for ASID managementMingwei Zhang
KVM SEV code uses bitmaps to manage ASID states. ASID 0 was always skipped because it is never used by VM. Thus, in existing code, ASID value and its bitmap postion always has an 'offset-by-1' relationship. Both SEV and SEV-ES shares the ASID space, thus KVM uses a dynamic range [min_asid, max_asid] to handle SEV and SEV-ES ASIDs separately. Existing code mixes the usage of ASID value and its bitmap position by using the same variable called 'min_asid'. Fix the min_asid usage: ensure that its usage is consistent with its name; allocate extra size for ASID 0 to ensure that each ASID has the same value with its bitmap position. Add comments on ASID bitmap allocation to clarify the size change. Signed-off-by: Mingwei Zhang <mizhang@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Marc Orr <marcorr@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Alper Gun <alpergun@google.com> Cc: Dionna Glaze <dionnaglaze@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vipin Sharma <vipinsh@google.com> Cc: Peter Gonda <pgonda@google.com> Cc: Joerg Roedel <joro@8bytes.org> Message-Id: <20210802180903.159381-1-mizhang@google.com> [Fix up sev_asid_free to also index by ASID, as suggested by Sean Christopherson, and use nr_asids in sev_cpu_init. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-04KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCBSean Christopherson
Use the raw ASID, not ASID-1, when nullifying the last used VMCB when freeing an SEV ASID. The consumer, pre_sev_run(), indexes the array by the raw ASID, thus KVM could get a false negative when checking for a different VMCB if KVM manages to reallocate the same ASID+VMCB combo for a new VM. Note, this cannot cause a functional issue _in the current code_, as pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is guaranteed to mismatch on the first VMRUN. However, prior to commit 8a14fe4f0c54 ("kvm: x86: Move last_cpu into kvm_vcpu_arch as last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the last_cpu variable. Thus it's theoretically possible that older versions of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID and VMCB exactly match those of a prior VM. Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: nSVM: remove useless kvm_clear_*_queuePaolo Bonzini
For an event to be in injected state when nested_svm_vmrun executes, it must have come from exitintinfo when svm_complete_interrupts ran: vcpu_enter_guest static_call(kvm_x86_run) -> svm_vcpu_run svm_complete_interrupts // now the event went from "exitintinfo" to "injected" static_call(kvm_x86_handle_exit) -> handle_exit svm_invoke_exit_handler vmrun_interception nested_svm_vmrun However, no event could have been in exitintinfo before a VMRUN vmexit. The code in svm.c is a bit more permissive than the one in vmx.c: if (is_external_interrupt(svm->vmcb->control.exit_int_info) && exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR && exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH && exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI) but in any case, a VMRUN instruction would not even start to execute during an attempted event delivery. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Drop redundant clearing of vcpu->arch.hflags at INIT/RESETSean Christopherson
Drop redundant clears of vcpu->arch.hflags in init_vmcb() since kvm_vcpu_reset() always clears hflags, and it is also always zero at vCPU creation time. And of course, the second clearing in init_vmcb() was always redundant. Suggested-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-46-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Emulate #INIT in response to triple fault shutdownSean Christopherson
Emulate a full #INIT instead of simply initializing the VMCB if the guest hits a shutdown. Initializing the VMCB but not other vCPU state, much of which is mirrored by the VMCB, results in incoherent and broken vCPU state. Ideally, KVM would not automatically init anything on shutdown, and instead put the vCPU into e.g. KVM_MP_STATE_UNINITIALIZED and force userspace to explicitly INIT or RESET the vCPU. Even better would be to add KVM_MP_STATE_SHUTDOWN, since technically NMI can break shutdown (and SMI on Intel CPUs). But, that ship has sailed, and emulating #INIT is the next best thing as that has at least some connection with reality since there exist bare metal platforms that automatically INIT the CPU if it hits shutdown. Fixes: 46fe4ddd9dbb ("[PATCH] KVM: SVM: Propagate cpu shutdown events to userspace") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-45-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: x86: Move setting of sregs during vCPU RESET/INIT to common x86Sean Christopherson
Move the setting of CR0, CR4, EFER, RFLAGS, and RIP from vendor code to common x86. VMX and SVM now have near-identical sequences, the only difference being that VMX updates the exception bitmap. Updating the bitmap on SVM is unnecessary, but benign. Unfortunately it can't be left behind in VMX due to the need to update exception intercepts after the control registers are set. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-37-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Stuff save->dr6 at during VMSA sync, not at RESET/INITSean Christopherson
Move code to stuff vmcb->save.dr6 to its architectural init value from svm_vcpu_reset() into sev_es_sync_vmsa(). Except for protected guests, a.k.a. SEV-ES guests, vmcb->save.dr6 is set during VM-Enter, i.e. the extra write is unnecessary. For SEV-ES, stuffing save->dr6 handles a theoretical case where the VMSA could be encrypted before the first KVM_RUN. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Drop redundant writes to vmcb->save.cr4 at RESET/INITSean Christopherson
Drop direct writes to vmcb->save.cr4 during vCPU RESET/INIT, as the values being written are fully redundant with respect to svm_set_cr4(vcpu, 0) a few lines earlier. Note, svm_set_cr4() also correctly forces X86_CR4_PAE when NPT is disabled. No functional change intended. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-32-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Tweak order of cr0/cr4/efer writes at RESET/INITSean Christopherson
Hoist svm_set_cr0() up in the sequence of register initialization during vCPU RESET/INIT, purely to match VMX so that a future patch can move the sequences to common x86. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-31-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Don't bother writing vmcb->save.rip at vCPU RESET/INITSean Christopherson
Drop unnecessary initialization of vmcb->save.rip during vCPU RESET/INIT, as svm_vcpu_run() unconditionally propagates VCPU_REGS_RIP to save.rip. No true functional change intended. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-21-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: x86: Move EDX initialization at vCPU RESET to common codeSean Christopherson
Move the EDX initialization at vCPU RESET, which is now identical between VMX and SVM, into common code. No functional change intended. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: x86: Consolidate APIC base RESET initialization codeSean Christopherson
Consolidate the APIC base RESET logic, which is currently spread out across both x86 and vendor code. For an in-kernel APIC, the vendor code is redundant. But for a userspace APIC, KVM relies on the vendor code to initialize vcpu->arch.apic_base. Hoist the vcpu->arch.apic_base initialization above the !apic check so that it applies to both flavors of APIC emulation, and delete the vendor code. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Drop explicit MMU reset at RESET/INITSean Christopherson
Drop an explicit MMU reset in SVM's vCPU RESET/INIT flow now that the common x86 path correctly handles conditional MMU resets, e.g. if INIT arrives while the vCPU is in 64-bit mode. This reverts commit ebae871a509d ("kvm: svm: reset mmu on VCPU reset"). Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Fall back to KVM's hardcoded value for EDX at RESET/INITSean Christopherson
At vCPU RESET/INIT (mostly RESET), stuff EDX with KVM's hardcoded, default Family-Model-Stepping ID of 0x600 if CPUID.0x1 isn't defined. At RESET, the CPUID lookup is guaranteed to "miss" because KVM emulates RESET before exposing the vCPU to userspace, i.e. userspace can't possibly have done set the vCPU's CPUID model, and thus KVM will always write '0'. At INIT, using 0x600 is less bad than using '0'. While initializing EDX to '0' is _extremely_ unlikely to be noticed by the guest, let alone break the guest, and can be overridden by userspace for the RESET case, using 0x600 is preferable as it will allow consolidating the relevant VMX and SVM RESET/INIT logic in the future. And, digging through old specs suggests that neither Intel nor AMD have ever shipped a CPU that initialized EDX to '0' at RESET. Regarding 0x600 as KVM's default Family, it is a sane default and in many ways the most appropriate. Prior to the 386 implementations, DX was undefined at RESET. With the 386, 486, 586/P5, and 686/P6/Athlon, both Intel and AMD set EDX to 3, 4, 5, and 6 respectively. AMD switched to using '15' as its primary Family with the introduction of AMD64, but Intel has continued using '6' for the last few decades. So, '6' is a valid Family for both Intel and AMD CPUs, is compatible with both 32-bit and 64-bit CPUs (albeit not a perfect fit for 64-bit AMD), and of the common Families (3 - 6), is the best fit with respect to KVM's virtual CPU model. E.g. prior to the P6, Intel CPUs did not have a STI window. Modern operating systems, Linux included, rely on the STI window, e.g. for "safe halt", and KVM unconditionally assumes the virtual CPU has an STI window. Thus enumerating a Family ID of 3, 4, or 5 would be provably wrong. Opportunistically remove a stale comment. Fixes: 66f7b72e1171 ("KVM: x86: Make register state after reset conform to specification") Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Require exact CPUID.0x1 match when stuffing EDX at INITSean Christopherson
Do not allow an inexact CPUID "match" when querying the guest's CPUID.0x1 to stuff EDX during INIT. In the common case, where the guest CPU model is an AMD variant, allowing an inexact match is a nop since KVM doesn't emulate Intel's goofy "out-of-range" logic for AMD and Hygon. If the vCPU model happens to be an Intel variant, an inexact match is possible if and only if the max CPUID leaf is precisely '0'. Aside from the fact that there's probably no CPU in existence with a single CPUID leaf, if the max CPUID leaf is '0', that means that CPUID.0.EAX is '0', and thus an inexact match for CPUID.0x1.EAX will also yield '0'. So, with lots of twisty logic, no functional change intended. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Zero out GDTR.base and IDTR.base on INITSean Christopherson
Explicitly set GDTR.base and IDTR.base to zero when intializing the VMCB. Functionally this only affects INIT, as the bases are implicitly set to zero on RESET by virtue of the VMCB being zero allocated. Per AMD's APM, GDTR.base and IDTR.base are zeroed after RESET and INIT. Fixes: 04d2cc7780d4 ("KVM: Move main vcpu loop into subarch independent code") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the VMSean Christopherson
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <0e8760a26151f47dc47052b25ca8b84fffe0641e.1625186503.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrlMaxim Levitsky
Currently when SVM is enabled in guest CPUID, AVIC is inhibited as soon as the guest CPUID is set. AVIC happens to be fully disabled on all vCPUs by the time any guest entry starts (if after migration the entry can be nested). The reason is that currently we disable avic right away on vCPU from which the kvm_request_apicv_update was called and for this case, it happens to be called on all vCPUs (by svm_vcpu_after_set_cpuid). After we stop doing this, AVIC will end up being disabled only when KVM_REQ_APICV_UPDATE is processed which is after we done switching to the nested guest. Fix this by just using vmcb01 in svm_refresh_apicv_exec_ctrl for avic (which is a right thing to do anyway). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: SVM: tweak warning about enabled AVIC on nested entryMaxim Levitsky
It is possible that AVIC was requested to be disabled but not yet disabled, e.g if the nested entry is done right after svm_vcpu_after_set_cpuid. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be ↵Maxim Levitsky
deactivated It is possible for AVIC inhibit and AVIC active state to be mismatched. Currently we disable AVIC right away on vCPU which started the AVIC inhibit request thus this warning doesn't trigger but at least in theory, if svm_set_vintr is called at the same time on multiple vCPUs, the warning can happen. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>