QEMU Virtual NVDIMM =================== This document explains the usage of virtual NVDIMM (vNVDIMM) feature which is available since QEMU v2.6.0. The current QEMU only implements the persistent memory mode of vNVDIMM device and not the block window mode. Basic Usage ----------- The storage of a vNVDIMM device in QEMU is provided by the memory backend (i.e. memory-backend-file and memory-backend-ram). A simple way to create a vNVDIMM device at startup time is done via the following command line options: -machine pc,nvdimm -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE -device nvdimm,id=nvdimm1,memdev=mem1 Where, - the "nvdimm" machine option enables vNVDIMM feature. - "slots=$N" should be equal to or larger than the total amount of normal RAM devices and vNVDIMM devices, e.g. $N should be >= 2 here. - "maxmem=$MAX_SIZE" should be equal to or larger than the total size of normal RAM devices and vNVDIMM devices, e.g. $MAX_SIZE should be >= $RAM_SIZE + $NVDIMM_SIZE here. - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE" creates a backend storage of size $NVDIMM_SIZE on a file $PATH. All accesses to the virtual NVDIMM device go to the file $PATH. "share=on/off" controls the visibility of guest writes. If "share=on", then guest writes will be applied to the backend file. If another guest uses the same backend file with option "share=on", then above writes will be visible to it as well. If "share=off", then guest writes won't be applied to the backend file and thus will be invisible to other guests. - "device nvdimm,id=nvdimm1,memdev=mem1" creates a virtual NVDIMM device whose storage is provided by above memory backend device. Multiple vNVDIMM devices can be created if multiple pairs of "-object" and "-device" are provided. For above command line options, if the guest OS has the proper NVDIMM driver, it should be able to detect a NVDIMM device which is in the persistent memory mode and whose size is $NVDIMM_SIZE. Note: 1. Prior to QEMU v2.8.0, if memory-backend-file is used and the actual backend file size is not equal to the size given by "size" option, QEMU will truncate the backend file by ftruncate(2), which will corrupt the existing data in the backend file, especially for the shrink case. QEMU v2.8.0 and later check the backend file size and the "size" option. If they do not match, QEMU will report errors and abort in order to avoid the data corruption. 2. QEMU v2.6.0 only puts a basic alignment requirement on the "size" option of memory-backend-file, e.g. 4KB alignment on x86. However, QEMU v.2.7.0 puts an additional alignment requirement, which may require a larger value than the basic one, e.g. 2MB on x86. This change breaks the usage of memory-backend-file that only satisfies the basic alignment. QEMU v2.8.0 and later remove the additional alignment on non-s390x architectures, so the broken memory-backend-file can work again. Label ----- QEMU v2.7.0 and later implement the label support for vNVDIMM devices. To enable label on vNVDIMM devices, users can simply add "label-size=$SZ" option to "-device nvdimm", e.g. -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K Note: 1. The minimal label size is 128KB. 2. QEMU v2.7.0 and later store labels at the end of backend storage. If a memory backend file, which was previously used as the backend of a vNVDIMM device without labels, is now used for a vNVDIMM device with label, the data in the label area at the end of file will be inaccessible to the guest. If any useful data (e.g. the meta-data of the file system) was stored there, the latter usage may result guest data corruption (e.g. breakage of guest file system). Hotplug ------- QEMU v2.8.0 and later implement the hotplug support for vNVDIMM devices. Similarly to the RAM hotplug, the vNVDIMM hotplug is accomplished by two monitor commands "object_add" and "device_add". For example, the following commands add another 4GB vNVDIMM device to the guest: (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=new_nvdimm.img,size=4G (qemu) device_add nvdimm,id=nvdimm2,memdev=mem2 Note: 1. Each hotplugged vNVDIMM device consumes one memory slot. Users should always ensure the memory option "-m ...,slots=N" specifies enough number of slots, i.e. N >= number of RAM devices + number of statically plugged vNVDIMM devices + number of hotplugged vNVDIMM devices 2. The similar is required for the memory option "-m ...,maxmem=M", i.e. M >= size of RAM devices + size of statically plugged vNVDIMM devices + size of hotplugged vNVDIMM devices Alignment --------- QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping address to the page size (getpagesize(2)) by default. However, some types of backends may require an alignment different than the page size. In that case, QEMU v2.12.0 and later provide 'align' option to memory-backend-file to allow users to specify the proper alignment. For example, device dax require the 2 MB alignment, so we can use following QEMU command line options to use it (/dev/dax0.0) as the backend of vNVDIMM: -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M -device nvdimm,id=nvdimm1,memdev=mem1 Guest Data Persistence ---------------------- Though QEMU supports multiple types of vNVDIMM backends on Linux, currently the only one that can guarantee the guest write persistence is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to which all guest access do not involve any host-side kernel cache. When using other types of backends, it's suggested to set 'unarmed' option of '-device nvdimm' to 'on', which sets the unarmed flag of the guest NVDIMM region mapping structure. This unarmed flag indicates guest software that this vNVDIMM device contains a region that cannot accept persistent writes. In result, for example, the guest Linux NVDIMM driver, marks such vNVDIMM device as read-only.