target-arm: add support for v8 VMULL.P64 instruction

Add support for the VMULL.P64 polynomial 64x64 to 128 bit multiplication
instruction in the A32/T32 instruction sets; this is part of the v8
Crypto Extensions.

To do this we have to move the neon_pmull_64_{lo,hi} helpers from
helper-a64.c into neon_helper.c so they can be used by the AArch32
translator.

Inspired-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
I started with Ard's patch but pretty much all the code is different
since we use the A64 helpers now. This version also UNDEFs correctly
for size == 1, and sets the ELF HWCAP2 bit.
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 9bda262..3241fec 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -468,6 +468,7 @@
     uint32_t hwcaps = 0;
 
     GET_FEATURE(ARM_FEATURE_V8_AES, ARM_HWCAP2_ARM_AES);
+    GET_FEATURE(ARM_FEATURE_V8_PMULL, ARM_HWCAP2_ARM_PMULL);
     GET_FEATURE(ARM_FEATURE_V8_SHA1, ARM_HWCAP2_ARM_SHA1);
     GET_FEATURE(ARM_FEATURE_V8_SHA256, ARM_HWCAP2_ARM_SHA2);
     GET_FEATURE(ARM_FEATURE_CRC, ARM_HWCAP2_ARM_CRC32);