aboutsummaryrefslogtreecommitdiff
path: root/docs/system/i386
diff options
context:
space:
mode:
Diffstat (limited to 'docs/system/i386')
-rw-r--r--docs/system/i386/amd-memory-encryption.rst206
-rw-r--r--docs/system/i386/hyperv.rst288
-rw-r--r--docs/system/i386/kvm-pv.rst100
-rw-r--r--docs/system/i386/sgx.rst188
-rw-r--r--docs/system/i386/xen.rst144
5 files changed, 926 insertions, 0 deletions
diff --git a/docs/system/i386/amd-memory-encryption.rst b/docs/system/i386/amd-memory-encryption.rst
new file mode 100644
index 0000000000..e9bc142bc1
--- /dev/null
+++ b/docs/system/i386/amd-memory-encryption.rst
@@ -0,0 +1,206 @@
+AMD Secure Encrypted Virtualization (SEV)
+=========================================
+
+Secure Encrypted Virtualization (SEV) is a feature found on AMD processors.
+
+SEV is an extension to the AMD-V architecture which supports running encrypted
+virtual machines (VMs) under the control of KVM. Encrypted VMs have their pages
+(code and data) secured such that only the guest itself has access to the
+unencrypted version. Each encrypted VM is associated with a unique encryption
+key; if its data is accessed by a different entity using a different key the
+encrypted guests data will be incorrectly decrypted, leading to unintelligible
+data.
+
+Key management for this feature is handled by a separate processor known as the
+AMD secure processor (AMD-SP), which is present in AMD SOCs. Firmware running
+inside the AMD-SP provides commands to support a common VM lifecycle. This
+includes commands for launching, snapshotting, migrating and debugging the
+encrypted guest. These SEV commands can be issued via KVM_MEMORY_ENCRYPT_OP
+ioctls.
+
+Secure Encrypted Virtualization - Encrypted State (SEV-ES) builds on the SEV
+support to additionally protect the guest register state. In order to allow a
+hypervisor to perform functions on behalf of a guest, there is architectural
+support for notifying a guest's operating system when certain types of VMEXITs
+are about to occur. This allows the guest to selectively share information with
+the hypervisor to satisfy the requested function.
+
+Launching
+---------
+
+Boot images (such as bios) must be encrypted before a guest can be booted. The
+``MEMORY_ENCRYPT_OP`` ioctl provides commands to encrypt the images: ``LAUNCH_START``,
+``LAUNCH_UPDATE_DATA``, ``LAUNCH_MEASURE`` and ``LAUNCH_FINISH``. These four commands
+together generate a fresh memory encryption key for the VM, encrypt the boot
+images and provide a measurement than can be used as an attestation of a
+successful launch.
+
+For a SEV-ES guest, the ``LAUNCH_UPDATE_VMSA`` command is also used to encrypt the
+guest register state, or VM save area (VMSA), for all of the guest vCPUs.
+
+``LAUNCH_START`` is called first to create a cryptographic launch context within
+the firmware. To create this context, guest owner must provide a guest policy,
+its public Diffie-Hellman key (PDH) and session parameters. These inputs
+should be treated as a binary blob and must be passed as-is to the SEV firmware.
+
+The guest policy is passed as plaintext. A hypervisor may choose to read it,
+but should not modify it (any modification of the policy bits will result
+in bad measurement). The guest policy is a 4-byte data structure containing
+several flags that restricts what can be done on a running SEV guest.
+See SEV API Spec ([SEVAPI]_) section 3 and 6.2 for more details.
+
+The guest policy can be provided via the ``policy`` property::
+
+ # ${QEMU} \
+ sev-guest,id=sev0,policy=0x1...\
+
+Setting the "SEV-ES required" policy bit (bit 2) will launch the guest as a
+SEV-ES guest::
+
+ # ${QEMU} \
+ sev-guest,id=sev0,policy=0x5...\
+
+The guest owner provided DH certificate and session parameters will be used to
+establish a cryptographic session with the guest owner to negotiate keys used
+for the attestation.
+
+The DH certificate and session blob can be provided via the ``dh-cert-file`` and
+``session-file`` properties::
+
+ # ${QEMU} \
+ sev-guest,id=sev0,dh-cert-file=<file1>,session-file=<file2>
+
+``LAUNCH_UPDATE_DATA`` encrypts the memory region using the cryptographic context
+created via the ``LAUNCH_START`` command. If required, this command can be called
+multiple times to encrypt different memory regions. The command also calculates
+the measurement of the memory contents as it encrypts.
+
+``LAUNCH_UPDATE_VMSA`` encrypts all the vCPU VMSAs for a SEV-ES guest using the
+cryptographic context created via the ``LAUNCH_START`` command. The command also
+calculates the measurement of the VMSAs as it encrypts them.
+
+``LAUNCH_MEASURE`` can be used to retrieve the measurement of encrypted memory and,
+for a SEV-ES guest, encrypted VMSAs. This measurement is a signature of the
+memory contents and, for a SEV-ES guest, the VMSA contents, that can be sent
+to the guest owner as an attestation that the memory and VMSAs were encrypted
+correctly by the firmware. The guest owner may wait to provide the guest
+confidential information until it can verify the attestation measurement.
+Since the guest owner knows the initial contents of the guest at boot, the
+attestation measurement can be verified by comparing it to what the guest owner
+expects.
+
+``LAUNCH_FINISH`` finalizes the guest launch and destroys the cryptographic
+context.
+
+See SEV API Spec ([SEVAPI]_) 'Launching a guest' usage flow (Appendix A) for the
+complete flow chart.
+
+To launch a SEV guest::
+
+ # ${QEMU} \
+ -machine ...,confidential-guest-support=sev0 \
+ -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1
+
+To launch a SEV-ES guest::
+
+ # ${QEMU} \
+ -machine ...,confidential-guest-support=sev0 \
+ -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1,policy=0x5
+
+An SEV-ES guest has some restrictions as compared to a SEV guest. Because the
+guest register state is encrypted and cannot be updated by the VMM/hypervisor,
+a SEV-ES guest:
+
+ - Does not support SMM - SMM support requires updating the guest register
+ state.
+ - Does not support reboot - a system reset requires updating the guest register
+ state.
+ - Requires in-kernel irqchip - the burden is placed on the hypervisor to
+ manage booting APs.
+
+Calculating expected guest launch measurement
+---------------------------------------------
+
+In order to verify the guest launch measurement, The Guest Owner must compute
+it in the exact same way as it is calculated by the AMD-SP. SEV API Spec
+([SEVAPI]_) section 6.5.1 describes the AMD-SP operations:
+
+ GCTX.LD is finalized, producing the hash digest of all plaintext data
+ imported into the guest.
+
+ The launch measurement is calculated as:
+
+ HMAC(0x04 || API_MAJOR || API_MINOR || BUILD || GCTX.POLICY || GCTX.LD || MNONCE; GCTX.TIK)
+
+ where "||" represents concatenation.
+
+The values of API_MAJOR, API_MINOR, BUILD, and GCTX.POLICY can be obtained
+from the ``query-sev`` qmp command.
+
+The value of MNONCE is part of the response of ``query-sev-launch-measure``: it
+is the last 16 bytes of the base64-decoded data field (see SEV API Spec
+([SEVAPI]_) section 6.5.2 Table 52: LAUNCH_MEASURE Measurement Buffer).
+
+The value of GCTX.LD is
+``SHA256(firmware_blob || kernel_hashes_blob || vmsas_blob)``, where:
+
+* ``firmware_blob`` is the content of the entire firmware flash file (for
+ example, ``OVMF.fd``). Note that you must build a stateless firmware file
+ which doesn't use an NVRAM store, because the NVRAM area is not measured, and
+ therefore it is not secure to use a firmware which uses state from an NVRAM
+ store.
+* if kernel is used, and ``kernel-hashes=on``, then ``kernel_hashes_blob`` is
+ the content of PaddedSevHashTable (including the zero padding), which itself
+ includes the hashes of kernel, initrd, and cmdline that are passed to the
+ guest. The PaddedSevHashTable struct is defined in ``target/i386/sev.c``.
+* if SEV-ES is enabled (``policy & 0x4 != 0``), ``vmsas_blob`` is the
+ concatenation of all VMSAs of the guest vcpus. Each VMSA is 4096 bytes long;
+ its content is defined inside Linux kernel code as ``struct vmcb_save_area``,
+ or in AMD APM Volume 2 ([APMVOL2]_) Table B-2: VMCB Layout, State Save Area.
+
+If kernel hashes are not used, or SEV-ES is disabled, use empty blobs for
+``kernel_hashes_blob`` and ``vmsas_blob`` as needed.
+
+Debugging
+---------
+
+Since the memory contents of a SEV guest are encrypted, hypervisor access to
+the guest memory will return cipher text. If the guest policy allows debugging,
+then a hypervisor can use the DEBUG_DECRYPT and DEBUG_ENCRYPT commands to access
+the guest memory region for debug purposes. This is not supported in QEMU yet.
+
+Snapshot/Restore
+----------------
+
+TODO
+
+Live Migration
+---------------
+
+TODO
+
+References
+----------
+
+`AMD Memory Encryption whitepaper
+<https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/white-papers/memory-encryption-white-paper.pdf>`_
+
+.. [SEVAPI] `Secure Encrypted Virtualization API
+ <https://www.amd.com/system/files/TechDocs/55766_SEV-KM_API_Specification.pdf>`_
+
+.. [APMVOL2] `AMD64 Architecture Programmer's Manual Volume 2: System Programming
+ <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf>`_
+
+KVM Forum slides:
+
+* `AMD’s Virtualization Memory Encryption (2016)
+ <http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf>`_
+* `Extending Secure Encrypted Virtualization With SEV-ES (2018)
+ <https://www.linux-kvm.org/images/9/94/Extending-Secure-Encrypted-Virtualization-with-SEV-ES-Thomas-Lendacky-AMD.pdf>`_
+
+`AMD64 Architecture Programmer's Manual:
+<https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf>`_
+
+* SME is section 7.10
+* SEV is section 15.34
+* SEV-ES is section 15.35
diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst
new file mode 100644
index 0000000000..2505dc4c86
--- /dev/null
+++ b/docs/system/i386/hyperv.rst
@@ -0,0 +1,288 @@
+Hyper-V Enlightenments
+======================
+
+
+Description
+-----------
+
+In some cases when implementing a hardware interface in software is slow, KVM
+implements its own paravirtualized interfaces. This works well for Linux as
+guest support for such features is added simultaneously with the feature itself.
+It may, however, be hard-to-impossible to add support for these interfaces to
+proprietary OSes, namely, Microsoft Windows.
+
+KVM on x86 implements Hyper-V Enlightenments for Windows guests. These features
+make Windows and Hyper-V guests think they're running on top of a Hyper-V
+compatible hypervisor and use Hyper-V specific features.
+
+
+Setup
+-----
+
+No Hyper-V enlightenments are enabled by default by either KVM or QEMU. In
+QEMU, individual enlightenments can be enabled through CPU flags, e.g:
+
+.. parsed-literal::
+
+ |qemu_system| --enable-kvm --cpu host,hv_relaxed,hv_vpindex,hv_time, ...
+
+Sometimes there are dependencies between enlightenments, QEMU is supposed to
+check that the supplied configuration is sane.
+
+When any set of the Hyper-V enlightenments is enabled, QEMU changes hypervisor
+identification (CPUID 0x40000000..0x4000000A) to Hyper-V. KVM identification
+and features are kept in leaves 0x40000100..0x40000101.
+
+
+Existing enlightenments
+-----------------------
+
+``hv-relaxed``
+ This feature tells guest OS to disable watchdog timeouts as it is running on a
+ hypervisor. It is known that some Windows versions will do this even when they
+ see 'hypervisor' CPU flag.
+
+``hv-vapic``
+ Provides so-called VP Assist page MSR to guest allowing it to work with APIC
+ more efficiently. In particular, this enlightenment allows paravirtualized
+ (exit-less) EOI processing.
+
+``hv-spinlocks`` = xxx
+ Enables paravirtualized spinlocks. The parameter indicates how many times
+ spinlock acquisition should be attempted before indicating the situation to the
+ hypervisor. A special value 0xffffffff indicates "never notify".
+
+``hv-vpindex``
+ Provides HV_X64_MSR_VP_INDEX (0x40000002) MSR to the guest which has Virtual
+ processor index information. This enlightenment makes sense in conjunction with
+ hv-synic, hv-stimer and other enlightenments which require the guest to know its
+ Virtual Processor indices (e.g. when VP index needs to be passed in a
+ hypercall).
+
+``hv-runtime``
+ Provides HV_X64_MSR_VP_RUNTIME (0x40000010) MSR to the guest. The MSR keeps the
+ virtual processor run time in 100ns units. This gives guest operating system an
+ idea of how much time was 'stolen' from it (when the virtual CPU was preempted
+ to perform some other work).
+
+``hv-crash``
+ Provides HV_X64_MSR_CRASH_P0..HV_X64_MSR_CRASH_P5 (0x40000100..0x40000105) and
+ HV_X64_MSR_CRASH_CTL (0x40000105) MSRs to the guest. These MSRs are written to
+ by the guest when it crashes, HV_X64_MSR_CRASH_P0..HV_X64_MSR_CRASH_P5 MSRs
+ contain additional crash information. This information is outputted in QEMU log
+ and through QAPI.
+ Note: unlike under genuine Hyper-V, write to HV_X64_MSR_CRASH_CTL causes guest
+ to shutdown. This effectively blocks crash dump generation by Windows.
+
+``hv-time``
+ Enables two Hyper-V-specific clocksources available to the guest: MSR-based
+ Hyper-V clocksource (HV_X64_MSR_TIME_REF_COUNT, 0x40000020) and Reference TSC
+ page (enabled via MSR HV_X64_MSR_REFERENCE_TSC, 0x40000021). Both clocksources
+ are per-guest, Reference TSC page clocksource allows for exit-less time stamp
+ readings. Using this enlightenment leads to significant speedup of all timestamp
+ related operations.
+
+``hv-synic``
+ Enables Hyper-V Synthetic interrupt controller - an extension of a local APIC.
+ When enabled, this enlightenment provides additional communication facilities
+ to the guest: SynIC messages and Events. This is a pre-requisite for
+ implementing VMBus devices (not yet in QEMU). Additionally, this enlightenment
+ is needed to enable Hyper-V synthetic timers. SynIC is controlled through MSRs
+ HV_X64_MSR_SCONTROL..HV_X64_MSR_EOM (0x40000080..0x40000084) and
+ HV_X64_MSR_SINT0..HV_X64_MSR_SINT15 (0x40000090..0x4000009F)
+
+ Requires: ``hv-vpindex``
+
+``hv-stimer``
+ Enables Hyper-V synthetic timers. There are four synthetic timers per virtual
+ CPU controlled through HV_X64_MSR_STIMER0_CONFIG..HV_X64_MSR_STIMER3_COUNT
+ (0x400000B0..0x400000B7) MSRs. These timers can work either in single-shot or
+ periodic mode. It is known that certain Windows versions revert to using HPET
+ (or even RTC when HPET is unavailable) extensively when this enlightenment is
+ not provided; this can lead to significant CPU consumption, even when virtual
+ CPU is idle.
+
+ Requires: ``hv-vpindex``, ``hv-synic``, ``hv-time``
+
+``hv-tlbflush``
+ Enables paravirtualized TLB shoot-down mechanism. On x86 architecture, remote
+ TLB flush procedure requires sending IPIs and waiting for other CPUs to perform
+ local TLB flush. In virtualized environment some virtual CPUs may not even be
+ scheduled at the time of the call and may not require flushing (or, flushing
+ may be postponed until the virtual CPU is scheduled). hv-tlbflush enlightenment
+ implements TLB shoot-down through hypervisor enabling the optimization.
+
+ Requires: ``hv-vpindex``
+
+``hv-ipi``
+ Enables paravirtualized IPI send mechanism. HvCallSendSyntheticClusterIpi
+ hypercall may target more than 64 virtual CPUs simultaneously, doing the same
+ through APIC requires more than one access (and thus exit to the hypervisor).
+
+ Requires: ``hv-vpindex``
+
+``hv-vendor-id`` = xxx
+ This changes Hyper-V identification in CPUID 0x40000000.EBX-EDX from the default
+ "Microsoft Hv". The parameter should be no longer than 12 characters. According
+ to the specification, guests shouldn't use this information and it is unknown
+ if there is a Windows version which acts differently.
+ Note: hv-vendor-id is not an enlightenment and thus doesn't enable Hyper-V
+ identification when specified without some other enlightenment.
+
+``hv-reset``
+ Provides HV_X64_MSR_RESET (0x40000003) MSR to the guest allowing it to reset
+ itself by writing to it. Even when this MSR is enabled, it is not a recommended
+ way for Windows to perform system reboot and thus it may not be used.
+
+``hv-frequencies``
+ Provides HV_X64_MSR_TSC_FREQUENCY (0x40000022) and HV_X64_MSR_APIC_FREQUENCY
+ (0x40000023) allowing the guest to get its TSC/APIC frequencies without doing
+ measurements.
+
+``hv-reenlightenment``
+ The enlightenment is nested specific, it targets Hyper-V on KVM guests. When
+ enabled, it provides HV_X64_MSR_REENLIGHTENMENT_CONTROL (0x40000106),
+ HV_X64_MSR_TSC_EMULATION_CONTROL (0x40000107)and HV_X64_MSR_TSC_EMULATION_STATUS
+ (0x40000108) MSRs allowing the guest to get notified when TSC frequency changes
+ (only happens on migration) and keep using old frequency (through emulation in
+ the hypervisor) until it is ready to switch to the new one. This, in conjunction
+ with ``hv-frequencies``, allows Hyper-V on KVM to pass stable clocksource
+ (Reference TSC page) to its own guests.
+
+ Note, KVM doesn't fully support re-enlightenment notifications and doesn't
+ emulate TSC accesses after migration so 'tsc-frequency=' CPU option also has to
+ be specified to make migration succeed. The destination host has to either have
+ the same TSC frequency or support TSC scaling CPU feature.
+
+ Recommended: ``hv-frequencies``
+
+``hv-evmcs``
+ The enlightenment is nested specific, it targets Hyper-V on KVM guests. When
+ enabled, it provides Enlightened VMCS version 1 feature to the guest. The feature
+ implements paravirtualized protocol between L0 (KVM) and L1 (Hyper-V)
+ hypervisors making L2 exits to the hypervisor faster. The feature is Intel-only.
+
+ Note: some virtualization features (e.g. Posted Interrupts) are disabled when
+ hv-evmcs is enabled. It may make sense to measure your nested workload with and
+ without the feature to find out if enabling it is beneficial.
+
+ Requires: ``hv-vapic``
+
+``hv-stimer-direct``
+ Hyper-V specification allows synthetic timer operation in two modes: "classic",
+ when expiration event is delivered as SynIC message and "direct", when the event
+ is delivered via normal interrupt. It is known that nested Hyper-V can only
+ use synthetic timers in direct mode and thus ``hv-stimer-direct`` needs to be
+ enabled.
+
+ Requires: ``hv-vpindex``, ``hv-synic``, ``hv-time``, ``hv-stimer``
+
+``hv-avic`` (``hv-apicv``)
+ The enlightenment allows to use Hyper-V SynIC with hardware APICv/AVIC enabled.
+ Normally, Hyper-V SynIC disables these hardware feature and suggests the guest
+ to use paravirtualized AutoEOI feature.
+ Note: enabling this feature on old hardware (without APICv/AVIC support) may
+ have negative effect on guest's performance.
+
+``hv-no-nonarch-coresharing`` = on/off/auto
+ This enlightenment tells guest OS that virtual processors will never share a
+ physical core unless they are reported as sibling SMT threads. This information
+ is required by Windows and Hyper-V guests to properly mitigate SMT related CPU
+ vulnerabilities.
+
+ When the option is set to 'auto' QEMU will enable the feature only when KVM
+ reports that non-architectural coresharing is impossible, this means that
+ hyper-threading is not supported or completely disabled on the host. This
+ setting also prevents migration as SMT settings on the destination may differ.
+ When the option is set to 'on' QEMU will always enable the feature, regardless
+ of host setup. To keep guests secure, this can only be used in conjunction with
+ exposing correct vCPU topology and vCPU pinning.
+
+``hv-version-id-build``, ``hv-version-id-major``, ``hv-version-id-minor``, ``hv-version-id-spack``, ``hv-version-id-sbranch``, ``hv-version-id-snumber``
+ This changes Hyper-V version identification in CPUID 0x40000002.EAX-EDX from the
+ default (WS2016).
+
+ - ``hv-version-id-build`` sets 'Build Number' (32 bits)
+ - ``hv-version-id-major`` sets 'Major Version' (16 bits)
+ - ``hv-version-id-minor`` sets 'Minor Version' (16 bits)
+ - ``hv-version-id-spack`` sets 'Service Pack' (32 bits)
+ - ``hv-version-id-sbranch`` sets 'Service Branch' (8 bits)
+ - ``hv-version-id-snumber`` sets 'Service Number' (24 bits)
+
+ Note: hv-version-id-* are not enlightenments and thus don't enable Hyper-V
+ identification when specified without any other enlightenments.
+
+``hv-syndbg``
+ Enables Hyper-V synthetic debugger interface, this is a special interface used
+ by Windows Kernel debugger to send the packets through, rather than sending
+ them via serial/network .
+ When enabled, this enlightenment provides additional communication facilities
+ to the guest: SynDbg messages.
+ This new communication is used by Windows Kernel debugger rather than sending
+ packets via serial/network, adding significant performance boost over the other
+ comm channels.
+ This enlightenment requires a VMBus device (-device vmbus-bridge,irq=15).
+
+ Requires: ``hv-relaxed``, ``hv_time``, ``hv-vapic``, ``hv-vpindex``, ``hv-synic``, ``hv-runtime``, ``hv-stimer``
+
+``hv-emsr-bitmap``
+ The enlightenment is nested specific, it targets Hyper-V on KVM guests. When
+ enabled, it allows L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to
+ avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. While the protocol is
+ supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires
+ Enlightened VMCS (``hv-evmcs``) feature to also be enabled.
+
+ Recommended: ``hv-evmcs`` (Intel)
+
+``hv-xmm-input``
+ Hyper-V specification allows to pass parameters for certain hypercalls using XMM
+ registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows
+ for faster hypercalls processing as KVM can avoid reading guest's memory.
+
+``hv-tlbflush-ext``
+ Allow for extended GVA ranges to be passed to Hyper-V TLB flush hypercalls
+ (HvFlushVirtualAddressList/HvFlushVirtualAddressListEx).
+
+ Requires: ``hv-tlbflush``
+
+``hv-tlbflush-direct``
+ The enlightenment is nested specific, it targets Hyper-V on KVM guests. When
+ enabled, it allows L0 (KVM) to directly handle TLB flush hypercalls from L2
+ guest without the need to exit to L1 (Hyper-V) hypervisor. While the feature is
+ supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires
+ Enlightened VMCS (``hv-evmcs``) feature to also be enabled.
+
+ Requires: ``hv-vapic``
+
+ Recommended: ``hv-evmcs`` (Intel)
+
+Supplementary features
+----------------------
+
+``hv-passthrough``
+ In some cases (e.g. during development) it may make sense to use QEMU in
+ 'pass-through' mode and give Windows guests all enlightenments currently
+ supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU
+ flag.
+
+ Note: ``hv-passthrough`` flag only enables enlightenments which are known to QEMU
+ (have corresponding 'hv-' flag) and copies ``hv-spinlocks`` and ``hv-vendor-id``
+ values from KVM to QEMU. ``hv-passthrough`` overrides all other 'hv-' settings on
+ the command line. Also, enabling this flag effectively prevents migration as the
+ list of enabled enlightenments may differ between target and destination hosts.
+
+``hv-enforce-cpuid``
+ By default, KVM allows the guest to use all currently supported Hyper-V
+ enlightenments when Hyper-V CPUID interface was exposed, regardless of if
+ some features were not announced in guest visible CPUIDs. ``hv-enforce-cpuid``
+ feature alters this behavior and only allows the guest to use exposed Hyper-V
+ enlightenments.
+
+
+Useful links
+------------
+Hyper-V Top Level Functional specification and other information:
+
+- https://github.com/MicrosoftDocs/Virtualization-Documentation
+- https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
+
diff --git a/docs/system/i386/kvm-pv.rst b/docs/system/i386/kvm-pv.rst
new file mode 100644
index 0000000000..1e5a9923ef
--- /dev/null
+++ b/docs/system/i386/kvm-pv.rst
@@ -0,0 +1,100 @@
+Paravirtualized KVM features
+============================
+
+Description
+-----------
+
+In some cases when implementing hardware interfaces in software is slow, ``KVM``
+implements its own paravirtualized interfaces.
+
+Setup
+-----
+
+Paravirtualized ``KVM`` features are represented as CPU flags. The following
+features are enabled by default for any CPU model when ``KVM`` acceleration is
+enabled:
+
+- ``kvmclock``
+- ``kvm-nopiodelay``
+- ``kvm-asyncpf``
+- ``kvm-steal-time``
+- ``kvm-pv-eoi``
+- ``kvmclock-stable-bit``
+
+``kvm-msi-ext-dest-id`` feature is enabled by default in x2apic mode with split
+irqchip (e.g. "-machine ...,kernel-irqchip=split -cpu ...,x2apic").
+
+Note: when CPU model ``host`` is used, QEMU passes through all supported
+paravirtualized ``KVM`` features to the guest.
+
+Existing features
+-----------------
+
+``kvmclock``
+ Expose a ``KVM`` specific paravirtualized clocksource to the guest. Supported
+ since Linux v2.6.26.
+
+``kvm-nopiodelay``
+ The guest doesn't need to perform delays on PIO operations. Supported since
+ Linux v2.6.26.
+
+``kvm-mmu``
+ This feature is deprecated.
+
+``kvm-asyncpf``
+ Enable asynchronous page fault mechanism. Supported since Linux v2.6.38.
+ Note: since Linux v5.10 the feature is deprecated and not enabled by ``KVM``.
+ Use ``kvm-asyncpf-int`` instead.
+
+``kvm-steal-time``
+ Enable stolen (when guest vCPU is not running) time accounting. Supported
+ since Linux v3.1.
+
+``kvm-pv-eoi``
+ Enable paravirtualized end-of-interrupt signaling. Supported since Linux
+ v3.10.
+
+``kvm-pv-unhalt``
+ Enable paravirtualized spinlocks support. Supported since Linux v3.12.
+
+``kvm-pv-tlb-flush``
+ Enable paravirtualized TLB flush mechanism. Supported since Linux v4.16.
+
+``kvm-pv-ipi``
+ Enable paravirtualized IPI mechanism. Supported since Linux v4.19.
+
+``kvm-poll-control``
+ Enable host-side polling on HLT control from the guest. Supported since Linux
+ v5.10.
+
+``kvm-pv-sched-yield``
+ Enable paravirtualized sched yield feature. Supported since Linux v5.10.
+
+``kvm-asyncpf-int``
+ Enable interrupt based asynchronous page fault mechanism. Supported since Linux
+ v5.10.
+
+``kvm-msi-ext-dest-id``
+ Support 'Extended Destination ID' for external interrupts. The feature allows
+ to use up to 32768 CPUs without IRQ remapping (but other limits may apply making
+ the number of supported vCPUs for a given configuration lower). Supported since
+ Linux v5.10.
+
+``kvmclock-stable-bit``
+ Tell the guest that guest visible TSC value can be fully trusted for kvmclock
+ computations and no warps are expected. Supported since Linux v2.6.35.
+
+Supplementary features
+----------------------
+
+``kvm-pv-enforce-cpuid``
+ Limit the supported paravirtualized feature set to the exposed features only.
+ Note, by default, ``KVM`` allows the guest to use all currently supported
+ paravirtualized features even when they were not announced in guest visible
+ CPUIDs. Supported since Linux v5.10.
+
+
+Useful links
+------------
+
+Please refer to Documentation/virt/kvm in Linux for additional details.
diff --git a/docs/system/i386/sgx.rst b/docs/system/i386/sgx.rst
new file mode 100644
index 0000000000..ab58b29392
--- /dev/null
+++ b/docs/system/i386/sgx.rst
@@ -0,0 +1,188 @@
+Software Guard eXtensions (SGX)
+===============================
+
+Overview
+--------
+
+Intel Software Guard eXtensions (SGX) is a set of instructions and mechanisms
+for memory accesses in order to provide security accesses for sensitive
+applications and data. SGX allows an application to use its particular
+address space as an *enclave*, which is a protected area provides confidentiality
+and integrity even in the presence of privileged malware. Accesses to the
+enclave memory area from any software not resident in the enclave are prevented,
+including those from privileged software.
+
+Virtual SGX
+-----------
+
+SGX feature is exposed to guest via SGX CPUID. Looking at SGX CPUID, we can
+report the same CPUID info to guest as on host for most of SGX CPUID. With
+reporting the same CPUID guest is able to use full capacity of SGX, and KVM
+doesn't need to emulate those info.
+
+The guest's EPC base and size are determined by QEMU, and KVM needs QEMU to
+notify such info to it before it can initialize SGX for guest.
+
+Virtual EPC
+~~~~~~~~~~~
+
+By default, QEMU does not assign EPC to a VM, i.e. fully enabling SGX in a VM
+requires explicit allocation of EPC to the VM. Similar to other specialized
+memory types, e.g. hugetlbfs, EPC is exposed as a memory backend.
+
+SGX EPC is enumerated through CPUID, i.e. EPC "devices" need to be realized
+prior to realizing the vCPUs themselves, which occurs long before generic
+devices are parsed and realized. This limitation means that EPC does not
+require -maxmem as EPC is not treated as {cold,hot}plugged memory.
+
+QEMU does not artificially restrict the number of EPC sections exposed to a
+guest, e.g. QEMU will happily allow you to create 64 1M EPC sections. Be aware
+that some kernels may not recognize all EPC sections, e.g. the Linux SGX driver
+is hardwired to support only 8 EPC sections.
+
+The following QEMU snippet creates two EPC sections, with 64M pre-allocated
+to the VM and an additional 28M mapped but not allocated::
+
+ -object memory-backend-epc,id=mem1,size=64M,prealloc=on \
+ -object memory-backend-epc,id=mem2,size=28M \
+ -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2
+
+Note:
+
+The size and location of the virtual EPC are far less restricted compared
+to physical EPC. Because physical EPC is protected via range registers,
+the size of the physical EPC must be a power of two (though software sees
+a subset of the full EPC, e.g. 92M or 128M) and the EPC must be naturally
+aligned. KVM SGX's virtual EPC is purely a software construct and only
+requires the size and location to be page aligned. QEMU enforces the EPC
+size is a multiple of 4k and will ensure the base of the EPC is 4k aligned.
+To simplify the implementation, EPC is always located above 4g in the guest
+physical address space.
+
+Migration
+~~~~~~~~~
+
+QEMU/KVM doesn't prevent live migrating SGX VMs, although from hardware's
+perspective, SGX doesn't support live migration, since both EPC and the SGX
+key hierarchy are bound to the physical platform. However live migration
+can be supported in the sense if guest software stack can support recreating
+enclaves when it suffers sudden lose of EPC; and if guest enclaves can detect
+SGX keys being changed, and handle gracefully. For instance, when ERESUME fails
+with #PF.SGX, guest software can gracefully detect it and recreate enclaves;
+and when enclave fails to unseal sensitive information from outside, it can
+detect such error and sensitive information can be provisioned to it again.
+
+CPUID
+~~~~~
+
+Due to its myriad dependencies, SGX is currently not listed as supported
+in any of QEMU's built-in CPU configuration. To expose SGX (and SGX Launch
+Control) to a guest, you must either use ``-cpu host`` to pass-through the
+host CPU model, or explicitly enable SGX when using a built-in CPU model,
+e.g. via ``-cpu <model>,+sgx`` or ``-cpu <model>,+sgx,+sgxlc``.
+
+All SGX sub-features enumerated through CPUID, e.g. SGX2, MISCSELECT,
+ATTRIBUTES, etc... can be restricted via CPUID flags. Be aware that enforcing
+restriction of MISCSELECT, ATTRIBUTES and XFRM requires intercepting ECREATE,
+i.e. may marginally reduce SGX performance in the guest. All SGX sub-features
+controlled via -cpu are prefixed with "sgx", e.g.::
+
+ $ qemu-system-x86_64 -cpu help | xargs printf "%s\n" | grep sgx
+ sgx
+ sgx-debug
+ sgx-encls-c
+ sgx-enclv
+ sgx-exinfo
+ sgx-kss
+ sgx-mode64
+ sgx-provisionkey
+ sgx-tokenkey
+ sgx1
+ sgx2
+ sgxlc
+
+The following QEMU snippet passes through the host CPU but restricts access to
+the provision and EINIT token keys::
+
+ -cpu host,-sgx-provisionkey,-sgx-tokenkey
+
+SGX sub-features cannot be emulated, i.e. sub-features that are not present
+in hardware cannot be forced on via '-cpu'.
+
+Virtualize SGX Launch Control
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+QEMU SGX support for Launch Control (LC) is passive, in the sense that it
+does not actively change the LC configuration. QEMU SGX provides the user
+the ability to set/clear the CPUID flag (and by extension the associated
+IA32_FEATURE_CONTROL MSR bit in fw_cfg) and saves/restores the LE Hash MSRs
+when getting/putting guest state, but QEMU does not add new controls to
+directly modify the LC configuration. Similar to hardware behavior, locking
+the LC configuration to a non-Intel value is left to guest firmware. Unlike
+host bios setting for SGX launch control(LC), there is no special bios setting
+for SGX guest by our design. If host is in locked mode, we can still allow
+creating VM with SGX.
+
+Feature Control
+~~~~~~~~~~~~~~~
+
+QEMU SGX updates the ``etc/msr_feature_control`` fw_cfg entry to set the SGX
+(bit 18) and SGX LC (bit 17) flags based on their respective CPUID support,
+i.e. existing guest firmware will automatically set SGX and SGX LC accordingly,
+assuming said firmware supports fw_cfg.msr_feature_control.
+
+Launching a guest
+-----------------
+
+To launch a SGX guest:
+
+.. parsed-literal::
+
+ |qemu_system_x86| \\
+ -cpu host,+sgx-provisionkey \\
+ -object memory-backend-epc,id=mem1,size=64M,prealloc=on \\
+ -M sgx-epc.0.memdev=mem1,sgx-epc.0.node=0
+
+Utilizing SGX in the guest requires a kernel/OS with SGX support.
+The support can be determined in guest by::
+
+ $ grep sgx /proc/cpuinfo
+
+and SGX epc info by::
+
+ $ dmesg | grep sgx
+ [ 0.182807] sgx: EPC section 0x140000000-0x143ffffff
+ [ 0.183695] sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
+
+To launch a SGX numa guest:
+
+.. parsed-literal::
+
+ |qemu_system_x86| \\
+ -cpu host,+sgx-provisionkey \\
+ -object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \\
+ -object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \\
+ -numa node,nodeid=0,cpus=0-1,memdev=node0 \\
+ -object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \\
+ -object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \\
+ -numa node,nodeid=1,cpus=2-3,memdev=node1 \\
+ -M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1
+
+and SGX epc numa info by::
+
+ $ dmesg | grep sgx
+ [ 0.369937] sgx: EPC section 0x180000000-0x183ffffff
+ [ 0.370259] sgx: EPC section 0x184000000-0x185bfffff
+
+ $ dmesg | grep SRAT
+ [ 0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff]
+ [ 0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff]
+
+References
+----------
+
+- `SGX Homepage <https://software.intel.com/sgx>`__
+
+- `SGX SDK <https://github.com/intel/linux-sgx.git>`__
+
+- SGX specification: Intel SDM Volume 3
diff --git a/docs/system/i386/xen.rst b/docs/system/i386/xen.rst
new file mode 100644
index 0000000000..46db5f34c1
--- /dev/null
+++ b/docs/system/i386/xen.rst
@@ -0,0 +1,144 @@
+Xen HVM guest support
+=====================
+
+
+Description
+-----------
+
+KVM has support for hosting Xen guests, intercepting Xen hypercalls and event
+channel (Xen PV interrupt) delivery. This allows guests which expect to be
+run under Xen to be hosted in QEMU under Linux/KVM instead.
+
+Using the split irqchip is mandatory for Xen support.
+
+Setup
+-----
+
+Xen mode is enabled by setting the ``xen-version`` property of the KVM
+accelerator, for example for Xen 4.17:
+
+.. parsed-literal::
+
+ |qemu_system| --accel kvm,xen-version=0x40011,kernel-irqchip=split
+
+Additionally, virtual APIC support can be advertised to the guest through the
+``xen-vapic`` CPU flag:
+
+.. parsed-literal::
+
+ |qemu_system| --accel kvm,xen-version=0x40011,kernel-irqchip=split --cpu host,+xen-vapic
+
+When Xen support is enabled, QEMU changes hypervisor identification (CPUID
+0x40000000..0x4000000A) to Xen. The KVM identification and features are not
+advertised to a Xen guest. If Hyper-V is also enabled, the Xen identification
+moves to leaves 0x40000100..0x4000010A.
+
+Properties
+----------
+
+The following properties exist on the KVM accelerator object:
+
+``xen-version``
+ This property contains the Xen version in ``XENVER_version`` form, with the
+ major version in the top 16 bits and the minor version in the low 16 bits.
+ Setting this property enables the Xen guest support. If Xen version 4.5 or
+ greater is specified, the HVM leaf in Xen CPUID is populated. Xen version
+ 4.6 enables the vCPU ID in CPUID, and version 4.17 advertises vCPU upcall
+ vector support to the guest.
+
+``xen-evtchn-max-pirq``
+ Xen PIRQs represent an emulated physical interrupt, either GSI or MSI, which
+ can be routed to an event channel instead of to the emulated I/O or local
+ APIC. By default, QEMU permits only 256 PIRQs because this allows maximum
+ compatibility with 32-bit MSI where the higher bits of the PIRQ# would need
+ to be in the upper 64 bits of the MSI message. For guests with large numbers
+ of PCI devices (and none which are limited to 32-bit addressing) it may be
+ desirable to increase this value.
+
+``xen-gnttab-max-frames``
+ Xen grant tables are the means by which a Xen guest grants access to its
+ memory for PV back ends (disk, network, etc.). Since QEMU only supports v1
+ grant tables which are 8 bytes in size, each page (each frame) of the grant
+ table can reference 512 pages of guest memory. The default number of frames
+ is 64, allowing for 32768 pages of guest memory to be accessed by PV backends
+ through simultaneous grants. For guests with large numbers of PV devices and
+ high throughput, it may be desirable to increase this value.
+
+Xen paravirtual devices
+-----------------------
+
+The Xen PCI platform device is enabled automatically for a Xen guest. This
+allows a guest to unplug all emulated devices, in order to use paravirtual
+block and network drivers instead.
+
+Those paravirtual Xen block, network (and console) devices can be created
+through the command line, and/or hot-plugged.
+
+To provide a Xen console device, define a character device and then a device
+of type ``xen-console`` to connect to it. For the Xen console equivalent of
+the handy ``-serial mon:stdio`` option, for example:
+
+.. parsed-literal::
+ -chardev stdio,mux=on,id=char0,signal=off -mon char0 \\
+ -device xen-console,chardev=char0
+
+The Xen network device is ``xen-net-device``, which becomes the default NIC
+model for emulated Xen guests, meaning that just the default NIC provided
+by QEMU should automatically work and present a Xen network device to the
+guest.
+
+Disks can be configured with '``-drive file=${GUEST_IMAGE},if=xen``' and will
+appear to the guest as ``xvda`` onwards.
+
+Under Xen, the boot disk is typically available both via IDE emulation, and
+as a PV block device. Guest bootloaders typically use IDE to load the guest
+kernel, which then unplugs the IDE and continues with the Xen PV block device.
+
+This configuration can be achieved as follows:
+
+.. parsed-literal::
+
+ |qemu_system| --accel kvm,xen-version=0x40011,kernel-irqchip=split \\
+ -drive file=${GUEST_IMAGE},if=xen \\
+ -drive file=${GUEST_IMAGE},file.locking=off,if=ide
+
+VirtIO devices can also be used; Linux guests may need to be dissuaded from
+umplugging them by adding '``xen_emul_unplug=never``' on their command line.
+
+Booting Xen PV guests
+---------------------
+
+Booting PV guest kernels is possible by using the Xen PV shim (a version of Xen
+itself, designed to run inside a Xen HVM guest and provide memory management
+services for one guest alone).
+
+The Xen binary is provided as the ``-kernel`` and the guest kernel itself (or
+PV Grub image) as the ``-initrd`` image, which actually just means the first
+multiboot "module". For example:
+
+.. parsed-literal::
+
+ |qemu_system| --accel kvm,xen-version=0x40011,kernel-irqchip=split \\
+ -chardev stdio,id=char0 -device xen-console,chardev=char0 \\
+ -display none -m 1G -kernel xen -initrd bzImage \\
+ -append "pv-shim console=xen,pv -- console=hvc0 root=/dev/xvda1" \\
+ -drive file=${GUEST_IMAGE},if=xen
+
+The Xen image must be built with the ``CONFIG_XEN_GUEST`` and ``CONFIG_PV_SHIM``
+options, and as of Xen 4.17, Xen's PV shim mode does not support using a serial
+port; it must have a Xen console or it will panic.
+
+The example above provides the guest kernel command line after a separator
+(" ``--`` ") on the Xen command line, and does not provide the guest kernel
+with an actual initramfs, which would need to listed as a second multiboot
+module. For more complicated alternatives, see the command line
+:ref:`documentation <system/invocation-qemu-options-initrd>` for the
+``-initrd`` option.
+
+Host OS requirements
+--------------------
+
+The minimal Xen support in the KVM accelerator requires the host to be running
+Linux v5.12 or newer. Later versions add optimisations: Linux v5.17 added
+acceleration of interrupt delivery via the Xen PIRQ mechanism, and Linux v5.19
+accelerated Xen PV timers and inter-processor interrupts (IPIs).