diff options
author | Peter Maydell <peter.maydell@linaro.org> | 2020-10-28 18:39:30 +0000 |
---|---|---|
committer | Peter Maydell <peter.maydell@linaro.org> | 2020-10-28 19:10:04 +0000 |
commit | 22c8808191adaa679d68f3fa473d8c473c4dad0c (patch) | |
tree | 175404da420c41029607b526d3dfd15cb8b5f33b | |
parent | 5f3837a15c0deda6a9520fd1ccf388206fd0fb64 (diff) |
target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hoststest-neon-be-fixes
The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.
For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
I believe that gvec_udot_idx_h and gvec_sdot_idx_h are OK
because the index there is over groups of 4*16-bit values,
which are 64 bits each.
-rw-r--r-- | target/arm/vec_helper.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 30d76d05be..0f33127c4c 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -293,7 +293,7 @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc) intptr_t index = simd_data(desc); uint32_t *d = vd; int8_t *n = vn; - int8_t *m_indexed = (int8_t *)vm + index * 4; + int8_t *m_indexed = (int8_t *)vm + H4(index) * 4; /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd. * Otherwise opr_sz is a multiple of 16. @@ -324,7 +324,7 @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc) intptr_t index = simd_data(desc); uint32_t *d = vd; uint8_t *n = vn; - uint8_t *m_indexed = (uint8_t *)vm + index * 4; + uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4; /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd. * Otherwise opr_sz is a multiple of 16. |