diff options
author | Gil Pitney <gil.pitney@linaro.org> | 2014-12-16 10:34:29 -0800 |
---|---|---|
committer | Gil Pitney <gil.pitney@linaro.org> | 2014-12-16 10:48:40 -0800 |
commit | eaed9770913a70f23870fd92448cc8026b48a0a6 (patch) | |
tree | 0db733fcb324f6cc1977345bc724a099e28d71c7 | |
parent | 4c383810169fe14b87a56d022b61f5155154f2e6 (diff) |
v0.9: Shamrock README and TODO listsshamrock_v0.9Khronos_conformance
and updated CREDITS
Signed-off-by: Gil Pitney <gil.pitney@linaro.org>
-rw-r--r-- | CREDITS | 19 | ||||
-rw-r--r-- | README (renamed from README.txt) | 64 | ||||
-rw-r--r-- | TODO | 61 | ||||
-rw-r--r-- | src/builtins/README.txt | 3 | ||||
-rw-r--r-- | tests/basic_test_failures.lst | 323 |
5 files changed, 132 insertions, 338 deletions
@@ -1,7 +1,14 @@ -Denis Steckelmacher -TI OpenCL team -Tom Gall -Gil Pitney -Show Liu -Jia Jia +This software's geneology: + - Denis Steckelmacher: http://people.freedesktop.org/~steckdenis/clover + - TI OpenCL team: http://git.ti.com/opencl +With some inclusions derived from: + - http://libclc.llvm.org + - pocl: http://portablecl.org +Contributors at Linaro include: + - Tom Gall + - Gil Pitney + - Show Liu + - Jia Jia +... and special thanks to + - Renato Golin (for help with clang/LLVM/ARM) @@ -1,4 +1,4 @@ -Shamrock: an OpenCL implementation based on clover +Shamrock: an OpenCL 1.1 implementation based on clover This is a continuation of the clover OpenCL project: http://people.freedesktop.org/~steckdenis/clover @@ -29,8 +29,8 @@ LLVM Configuration: This was tested using LLVM 3.5.0 stable release from: http://llvm.org/releases/download.html -Note: LLVM must be configured and built with certain options to link with shamrock for -ARM. +Note: LLVM must be configured and built with certain options to link with +shamrock for ARM. The following creates a release build for ARM, with LLVM installed into /opt/llvm: @@ -47,26 +47,22 @@ Shamrock Build: Current Branch: Khronos_conformance Usage: cmake <project_src_dir> <optional_defines>* - where <optional_defines*> are: - -DPROJECT=shamrock | shannon | hawking - -DLLVM_CONFIG_EXECUTABLE=<path to private llvm-config version> -Note PROJECT=shamrock is default. -The best way to compile is to use an out of src build, eg for a Debug build, +If LLVM lives in a private path: + -DLLVM_CONFIG_EXECUTABLE=<path to private llvm-config version> +If your Clang is installed to a different location than LLVM, +then define CLANG_INCLUDE_DIR and CLANG_LIB_DIR on the cmake cmd line: + -DCLANG_INCLUDE_DIR=/opt/clang/include -DCLANG_LIB_DIR=/opt/clang/lib + +The best way to compile is to use an out of src build; eg for a Debug build, and custom LLVM: % mkdir shamrock_build % cd shamrock_build -% cmake -DLLVM_CONFIG_EXECUTABLE=/opt/llvm/bin/llvm-config -DCMAKE_BUILD_TYPE=Debug <path_to>/shamrock +% cmake -DCMAKE_BUILD_TYPE=Debug <path_to>/shamrock % make % sudo make install -If your Clang is installed to a different location than LLVM, -then define CLANG_INCLUDE_DIR and CLANG_LIB_DIR on the cmake cmd line: - - -DCLANG_INCLUDE_DIR=/opt/clang/include -DCLANG_LIB_DIR=/opt/clang/lib - - SANITY TESTS ============ @@ -81,25 +77,25 @@ Latest Results: shamrock_build> make test Running tests... /usr/bin/ctest --force-new-ctest-process -Test project /home/gpitney/shamrock_build +Test project /home/user/shamrock_build Start 1: platform -1/8 Test #1: platform ......................... Passed 0.11 sec +1/8 Test #1: platform ......................... Passed 0.02 sec Start 2: device -2/8 Test #2: device ........................... Passed 0.01 sec +2/8 Test #2: device ........................... Passed 0.02 sec Start 3: context -3/8 Test #3: context .......................... Passed 0.01 sec +3/8 Test #3: context .......................... Passed 0.02 sec Start 4: commandqueue -4/8 Test #4: commandqueue ..................... Passed 1.03 sec +4/8 Test #4: commandqueue ..................... Passed 1.06 sec Start 5: mem -5/8 Test #5: mem .............................. Passed 0.01 sec +5/8 Test #5: mem .............................. Passed 0.02 sec Start 6: kernel -6/8 Test #6: kernel ...........................***Failed 0.90 sec +6/8 Test #6: kernel ........................... Passed 3.43 sec Start 7: program -7/8 Test #7: program .......................... Passed 2.17 sec +7/8 Test #7: program .......................... Passed 5.21 sec Start 8: builtins -8/8 Test #8: builtins ......................... Passed 1.53 sec +8/8 Test #8: builtins ......................... Passed 3.45 sec -88% tests passed, 1 tests failed out of 8 +100% tests passed, 0 tests failed out of 8 PIGLIT TESTS ============ @@ -116,14 +112,18 @@ To run OpenCL tests, results in results/all_cl/main % piglit run tests/all_cl results/all_cl -DEBUGGING OpenCL Kernels: +Debugging OpenCL Kernels: ========================= -1. printf: A builtin function named "debug" maps to the printf symbol in the getBuiltin() - callback function, allowing printf from OpenCL kernels. Alternatively, this mechanism - can be used to define aribraty functions to be called back from kernels. +1. printf: A prototype has been added to include/CL/cpu.h, allowing printf + to be called from OpenCL kernels. + +2. gdb: + The kernel is called from Coal::CPUKernelWorkGroup::run(), at the + line: + p_kernel_func_addr(p_args); -2. gdb: Using the above getBuiltin() mechanism, a breakpoint can be placed in a callback - function at kernel exit, then stepping back into the kernel via gdb, will allow - debug of the kernel code (assembly level stepping). + Placing a breakpoint before this line, one can then put a breakpoint on + the kernel function itself, disassemble, and stepi through the + assembly code. @@ -1,29 +1,64 @@ Things To Do: -Features: -========= +Roadmap +======= + +1. Resync with git.ti.com/opencl. + + Update to latest common code from the TI repo. + +2. Merge TI OpenCL DSP Device support from git.ti.com/opencl. + + Get the CPU and DSP device support to act as plugins to a common + framework, so that they may coexist. + +3. Validate CPU Device for ARMv8, x86_64 -1. Merge latest TI OpenCL DSP Device support + Ensure same tests that pass on ARM also work on x86. -2. Update to OpenCL v 1.2 +4. Update to OpenCL v 1.2 + +5. Move to SPIR. + + Once a SPIR to ARM backend becomes available, move to SPIR as a more + standard IR. Issues: ======= -1. Khronos basic tests failures (both on ARM and x86_64), documented here: +1. Khronos tests: + + The majority of Khronos basic tests pass, proving that MCJIT is working + pretty well for ARM. + + The remaining Khronos 'basic' test failures (ARM only), documented here: + + tests/basic_test_failures.lst -tests/basic_test_failures.lst + Some of the other Khronos tests behond "basic" have been run on ARM, + but the failures have yet to be analysed. + computeinfo: PASS + api: 55/61 PASS + basic: 87/95 PASS + buffers: 78/81 PASS + commonfns: 17/17 PASS -2. Testing Full DSP Device support. +2. Builtins: -Requires extra TI DSP compiler, DSP side files and builtin library which are not -currently released. + atomic builtins not yet implemented. + math builtins implemented (using libm or builtins), but not all pass the + Khronos criteria for accuracy. -The build therefore defaults to SHAMROCK build type, for CPU Device only, and uses a CPU only -builtins library. The clc.h, therefore, has not been tested with DSP Device builds. + The current clc.h has its roots in LLVM libclc, but has diverged. It + may be interesting to resync with that project, if those builtins + actually pass the Khronos tests. -3. Sanity Test, one kernel test failure +3. DSP Device support: -See README.txt for latest results. + This requires extra TI DSP compiler, DSP side files and builtin library + support which is released by TI as part of its Keystone II MCSDK-HPC product. + Though the CMake files allow building for TI devices, all the DSP side files + are not in opensource, so the build defaults to SHAMROCK build type, + building for CPU Device only. diff --git a/src/builtins/README.txt b/src/builtins/README.txt index 5e16118..6e67eec 100644 --- a/src/builtins/README.txt +++ b/src/builtins/README.txt @@ -1,8 +1,7 @@ This directory (builtins) is intended to supercede src/runtime as a means to provide a builtins library for OpenCL kernels. -Note: some of the files here do not compile due to an address space casting -error, and are suffixed *.cl.broken. +Note: some of the files here do not compile yet, and are suffixed *.cl.broken. Files here were imported from the TI opencl_builtins private repository and repurposed for CPU device (from DSP device). diff --git a/tests/basic_test_failures.lst b/tests/basic_test_failures.lst index d5e4871..a0fff9c 100644 --- a/tests/basic_test_failures.lst +++ b/tests/basic_test_failures.lst @@ -12,6 +12,7 @@ Failure Mode: Analysis: hiloeo +astype ====== Failure Mode: ------------ @@ -22,8 +23,17 @@ Analysis: -------- valgrind analysis on shamrock showed huge memory leaks around creating and deleting programs, which were due to LLVM objects not getting freed. This -could either be a usage problem, or a bug in LLVM MCJIT execution engine. +could either be a usage problem, or leaks in LLVM MCJIT execution engine. +Note also in llvm-src/tools/clang/include/clang/Frontend/CompilerInstance this +comment: + // FIXME: Eliminate the llvm_shutdown requirement, that should either be part + // of the context or else not CompilerInstance specific. + bool ExecuteAction(FrontendAction &Act); + +The Khronos tests do not call llvm_shutdown (nor should they), but also often +do not call clReleaseProgram() after calling clCreateProgram() many times in a +loop . async_copy_global_to_local.txt async_copy_local_to_global.txt @@ -34,7 +44,7 @@ Failure Mode: ------------ All of the above 4 tests fail in the same way: Due to the Khronos generated CL file not being able to compile. These also fail the same way on -Keystone EVM (which doesn't use MCJIT). +Keystone II EVM (which doesn't use MCJIT). async_copy_global_to_local... Testing char @@ -65,12 +75,12 @@ Note the cast of (event_t)0 in the kernel above. Per the discussion here: http://comments.gmane.org/gmane.comp.compilers.clang.scm/93008 , it appears the spec is vague on this point, but the Khronos test nevertheless expects the cast to compile. -So, it seems a clang patch for OpenCL event_t casts of zero may be required. +It seems the Khronos test and clang are in conflict. kernel_memory_alignment_constant.txt ==================================== Failure Mode: ------------- +------------- This fails due to inability to compile a Khronos test generated CL program. @@ -86,7 +96,7 @@ program.cl:5:18: error: variable in constant address space must be initialized program.cl:6:18: error: variable in constant address space must be initialized program.cl:7:19: error: variable in constant address space must be initialized -ERROR: clBuildProgram failed! (CL_BUILD_PROGRAM_FAILURE from /home/gpitney/opencl_conformance/test_common/harness/kernelHelpers.c:35) +ERROR: clBuildProgram failed! (CL_BUILD_PROGRAM_FAILURE from /home/gpitney/open Original source is: ------------ constant char mem0[3]; @@ -116,297 +126,40 @@ Some digging shows this clang error was added after LLVM 3.3 (LLVM version used by TI Keystone, which explains why it passes there): http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20131230/096405.html -In this case, the LLVM clang compiler and the Khronos tests are in conflict. - -local_kernel_def.txt -==================== -Failure Mode: ------------- -This fails due to inability to compile a Khronos test generated CL program. - - -local_kernel_def... -program.cl:3:23: error: 'tmp_sum' declared as an array with a negative size - -ERROR: clBuildProgram failed! (CL_BUILD_PROGRAM_FAILURE from /home/gpitney/opencl_conformance/test_common/harness/kernelHelpers.c:35) -Original source is: ------------ -__kernel void compute_sum_with_localmem(__global int *a, int n, __global int *sum) -{ - __local int tmp_sum[-2147483648]; - int tid = get_local_id(0); - int lsize = get_local_size(0); - int i; +In this case, the LLVM clang compiler and the Khronos tests seem to be +in conflict. -[... snip ...] -Analysis: --------- -This test also fails on Keystone, but the negative number is (-4). - -The Khronos test is casting a size_t value for work group size to an int, -and printing it into the kernel string using the %d printf() modifier. -This does not appear to be the right printf() modifier for a size_t, so -the test code appears to be in error. - -parameter_types -================= +local_kernel_scope +================== Failure Mode: ------------ -Invalid results results returned from test-generated OCL kernel, which uses -vector parameters of various sizes. - -[ ... snip ...] -Testing vector size 4 -Kernel: __kernel void test_kernel( -char4 c, uchar4 uc, short4 s, ushort4 us, int4 i, uint4 ui, float4 f, -__global float4 *result) -{ - result[0] = convert_float4(c); - result[1] = convert_float4(uc); - result[2] = convert_float4(s); - result[3] = convert_float4(us); - result[4] = convert_float4(i); - result[5] = convert_float4(ui); - result[6] = f; -} - -Conversion from char4 failed: index 0 got 4.28107e-38, expected 0. -Conversion from char4 failed: index 2 got 16, expected 2. -Conversion from char4 failed: index 3 got 1, expected -3. -Conversion from uchar4 failed: index 0 got 4.28107e-38, expected 16. -Conversion from uchar4 failed: index 1 got -1, expected 1. -Conversion from uchar4 failed: index 2 got 18, expected 2. -Conversion from uchar4 failed: index 3 got 1, expected 3. -Conversion from short4 failed: index 0 got -19, expected -17. -Conversion from short4 failed: index 2 got 20, expected 2. -Conversion from short4 failed: index 3 got 1, expected -3. -Conversion from ushort4 failed: index 0 got -23, expected 18. -Conversion from ushort4 failed: index 1 got -1, expected 1. -Conversion from ushort4 failed: index 2 got 0, expected 2. -Conversion from ushort4 failed: index 3 got 0, expected 3. -Conversion from int4 failed: index 0 got 0, expected -19. -Conversion from int4 failed: index 1 got 0, expected -1. -Conversion from int4 failed: index 2 got 0, expected 2. -Conversion from int4 failed: index 3 got 0, expected -3. -Conversion from uint4 failed: index 0 got 0, expected 20. -Conversion from uint4 failed: index 1 got 0, expected 1. -Conversion from uint4 failed: index 2 got 0, expected 2. -Conversion from uint4 failed: index 3 got 0, expected 3. -Conversion from float4 failed: index 0 got 0, expected -23. -Conversion from float4 failed: index 1 got 0, expected -1. -Conversion from float4 failed: index 2 got 0, expected 2. -Conversion from float4 failed: index 3 got 0, expected -3. -Testing vector size 8 -Kernel: __kernel void test_kernel( -char8 c, uchar8 uc, short8 s, ushort8 us, int8 i, uint8 ui, float8 f, -__global float8 *result) -{ - result[0] = convert_float8(c); - result[1] = convert_float8(uc); - result[2] = convert_float8(s); - result[3] = convert_float8(us); - result[4] = convert_float8(i); - result[5] = convert_float8(ui); - result[6] = f; -} - -Conversion from char8 failed: index 0 got -5.99946e-08, expected 0. -Conversion from char8 failed: index 2 got 16, expected 2. -Conversion from char8 failed: index 3 got 1, expected -3. -Conversion from char8 failed: index 4 got 4.28106e-38, expected 4. -Conversion from char8 failed: index 5 got -1, expected -5. -Conversion from char8 failed: index 6 got 18, expected 6. -Conversion from char8 failed: index 7 got 1, expected -7. -Conversion from uchar8 failed: index 0 got -19, expected 16. -Conversion from uchar8 failed: index 1 got -1, expected 1. -Conversion from uchar8 failed: index 2 got 20, expected 2. -Conversion from uchar8 failed: index 3 got 1, expected 3. -Conversion from uchar8 failed: index 4 got -5.99946e-08, expected 4. -Conversion from uchar8 failed: index 5 got -1, expected 5. -Conversion from uchar8 failed: index 6 got 0, expected 6. -Conversion from uchar8 failed: index 7 got 0, expected 7. -Conversion from short8 failed: index 0 got 0, expected -17. -Conversion from short8 failed: index 1 got 0, expected -1. -Conversion from short8 failed: index 2 got 0, expected 2. -Conversion from short8 failed: index 3 got 0, expected -3. -Conversion from short8 failed: index 4 got 0, expected 4. -Conversion from short8 failed: index 5 got 0, expected -5. -Conversion from short8 failed: index 6 got 0, expected 6. -Conversion from short8 failed: index 7 got 0, expected -7. -Conversion from ushort8 failed: index 0 got 0, expected 18. -Conversion from ushort8 failed: index 1 got 0, expected 1. -Conversion from ushort8 failed: index 2 got 0, expected 2. -Conversion from ushort8 failed: index 3 got 0, expected 3. -Conversion from ushort8 failed: index 4 got 0, expected 4. -Conversion from ushort8 failed: index 5 got 0, expected 5. -Conversion from ushort8 failed: index 6 got 0, expected 6. -Conversion from ushort8 failed: index 7 got 0, expected 7. -Conversion from int8 failed: index 0 got 0, expected -19. -Conversion from int8 failed: index 1 got 0, expected -1. -Conversion from int8 failed: index 2 got 0, expected 2. -Conversion from int8 failed: index 3 got 0, expected -3. -Conversion from int8 failed: index 4 got 0, expected 4. -Conversion from int8 failed: index 5 got 0, expected -5. -Conversion from int8 failed: index 6 got 0, expected 6. -Conversion from int8 failed: index 7 got 0, expected -7. -Conversion from uint8 failed: index 0 got 0, expected 20. -Conversion from uint8 failed: index 1 got 0, expected 1. -Conversion from uint8 failed: index 2 got 0, expected 2. -Conversion from uint8 failed: index 3 got 0, expected 3. -Conversion from uint8 failed: index 4 got 0, expected 4. -Conversion from uint8 failed: index 5 got 0, expected 5. -Conversion from uint8 failed: index 6 got 0, expected 6. -Conversion from uint8 failed: index 7 got 0, expected 7. -Conversion from float8 failed: index 0 got 0, expected -23. -Conversion from float8 failed: index 1 got 0, expected -1. -Conversion from float8 failed: index 2 got 0, expected 2. -Conversion from float8 failed: index 3 got 0, expected -3. -Conversion from float8 failed: index 4 got 0, expected 4. -Conversion from float8 failed: index 5 got 0, expected -5. -Conversion from float8 failed: index 6 got 0, expected 6. -Conversion from float8 failed: index 7 got 0, expected -7. -Testing vector size 16 -Kernel: __kernel void test_kernel( -char16 c, uchar16 uc, short16 s, ushort16 us, int16 i, uint16 ui, float16 f, -__global float16 *result) -{ - result[0] = convert_float16(c); - result[1] = convert_float16(uc); - result[2] = convert_float16(s); - result[3] = convert_float16(us); - result[4] = convert_float16(i); - result[5] = convert_float16(ui); - result[6] = f; -} +The max of a set of unsigned integers computed from an OCL kernel differs +from the max of the same set computed by the host. -Conversion from char16 failed: index 0 got -7.22404e-06, expected 0. -Conversion from char16 failed: index 2 got 16, expected 2. -Conversion from char16 failed: index 3 got 1, expected -3. -Conversion from char16 failed: index 4 got -3.96717e-07, expected 4. -Conversion from char16 failed: index 5 got -1, expected -5. -Conversion from char16 failed: index 6 got 18, expected 6. -Conversion from char16 failed: index 7 got 1, expected -7. -Conversion from char16 failed: index 8 got 0, expected 8. -Conversion from char16 failed: index 9 got -1, expected -9. -Conversion from char16 failed: index 10 got 20, expected 10. -Conversion from char16 failed: index 11 got 1, expected -11. -Conversion from char16 failed: index 12 got 4.28106e-38, expected 12. -Conversion from char16 failed: index 13 got -1, expected -13. -Conversion from char16 failed: index 14 got 0, expected 14. -Conversion from char16 failed: index 15 got 0, expected -15. -Conversion from uchar16 failed: index 0 got 0, expected 16. -Conversion from uchar16 failed: index 1 got 0, expected 1. -Conversion from uchar16 failed: index 2 got 0, expected 2. -Conversion from uchar16 failed: index 3 got 0, expected 3. -Conversion from uchar16 failed: index 4 got 0, expected 4. -Conversion from uchar16 failed: index 5 got 0, expected 5. -Conversion from uchar16 failed: index 6 got 0, expected 6. -Conversion from uchar16 failed: index 7 got 0, expected 7. -Conversion from uchar16 failed: index 8 got 0, expected 8. -Conversion from uchar16 failed: index 9 got 0, expected 9. -Conversion from uchar16 failed: index 10 got 0, expected 10. -Conversion from uchar16 failed: index 11 got 0, expected 11. -Conversion from uchar16 failed: index 12 got -3.96712e-07, expected 12. -Conversion from uchar16 failed: index 13 got 0, expected 13. -Conversion from uchar16 failed: index 14 got 0, expected 14. -Conversion from uchar16 failed: index 15 got 0, expected 15. -Conversion from short16 failed: index 0 got 0, expected -17. -Conversion from short16 failed: index 1 got 0, expected -1. -Conversion from short16 failed: index 2 got 0, expected 2. -Conversion from short16 failed: index 3 got 0, expected -3. -Conversion from short16 failed: index 4 got 0, expected 4. -Conversion from short16 failed: index 5 got 0, expected -5. -Conversion from short16 failed: index 6 got 0, expected 6. -Conversion from short16 failed: index 7 got 0, expected -7. -Conversion from short16 failed: index 8 got 0, expected 8. -Conversion from short16 failed: index 9 got 0, expected -9. -Conversion from short16 failed: index 10 got 0, expected 10. -Conversion from short16 failed: index 11 got 0, expected -11. -Conversion from short16 failed: index 12 got 0, expected 12. -Conversion from short16 failed: index 13 got 0, expected -13. -Conversion from short16 failed: index 14 got 0, expected 14. -Conversion from short16 failed: index 15 got 0, expected -15. -Conversion from ushort16 failed: index 0 got 0, expected 18. -Conversion from ushort16 failed: index 1 got 0, expected 1. -Conversion from ushort16 failed: index 2 got 0, expected 2. -Conversion from ushort16 failed: index 3 got 0, expected 3. -Conversion from ushort16 failed: index 4 got 0, expected 4. -Conversion from ushort16 failed: index 5 got 0, expected 5. -Conversion from ushort16 failed: index 6 got 0, expected 6. -Conversion from ushort16 failed: index 7 got 0, expected 7. -Conversion from ushort16 failed: index 8 got 0, expected 8. -Conversion from ushort16 failed: index 9 got 0, expected 9. -Conversion from ushort16 failed: index 10 got 0, expected 10. -Conversion from ushort16 failed: index 11 got 0, expected 11. -Conversion from ushort16 failed: index 12 got 0, expected 12. -Conversion from ushort16 failed: index 13 got 0, expected 13. -Conversion from ushort16 failed: index 14 got 0, expected 14. -Conversion from ushort16 failed: index 15 got 0, expected 15. -Conversion from int16 failed: index 0 got 0, expected -19. -Conversion from int16 failed: index 1 got 0, expected -1. -Conversion from int16 failed: index 2 got 0, expected 2. -Conversion from int16 failed: index 3 got 0, expected -3. -Conversion from int16 failed: index 4 got 0, expected 4. -Conversion from int16 failed: index 5 got 0, expected -5. -Conversion from int16 failed: index 6 got 0, expected 6. -Conversion from int16 failed: index 7 got 0, expected -7. -Conversion from int16 failed: index 8 got 0, expected 8. -Conversion from int16 failed: index 9 got 0, expected -9. -Conversion from int16 failed: index 10 got 0, expected 10. -Conversion from int16 failed: index 11 got 0, expected -11. -Conversion from int16 failed: index 12 got 0, expected 12. -Conversion from int16 failed: index 13 got 0, expected -13. -Conversion from int16 failed: index 14 got 0, expected 14. -Conversion from int16 failed: index 15 got 0, expected -15. -Conversion from uint16 failed: index 0 got 0, expected 20. -Conversion from uint16 failed: index 1 got 0, expected 1. -Conversion from uint16 failed: index 2 got 0, expected 2. -Conversion from uint16 failed: index 3 got 0, expected 3. -Conversion from uint16 failed: index 4 got 0, expected 4. -Conversion from uint16 failed: index 5 got 0, expected 5. -Conversion from uint16 failed: index 6 got 0, expected 6. -Conversion from uint16 failed: index 7 got 0, expected 7. -Conversion from uint16 failed: index 8 got 0, expected 8. -Conversion from uint16 failed: index 9 got 0, expected 9. -Conversion from uint16 failed: index 10 got 0, expected 10. -Conversion from uint16 failed: index 11 got 0, expected 11. -Conversion from uint16 failed: index 12 got 0, expected 12. -Conversion from uint16 failed: index 13 got 0, expected 13. -Conversion from uint16 failed: index 14 got 0, expected 14. -Conversion from uint16 failed: index 15 got 0, expected 15. - -[ then crashes on next sub-tests] +Error Message: +local_kernel_scope... + Testing with 6 groups, 184 elements per group... +ERROR: Local max validation failed! (expected 4274779084, got 4290015211 for i=0) Analysis: -------- -Some rather intense debugging found the culprit being the float<n> * -results output kernel vector argument was being *modified* by the MCJIT -generated ARM assembly kernel code! This was determined by gdb debugging -via assembly into the JIT'd kernel, and also inserting callbacks to -builtin funcitons to inspect the results pointer argument at entry and exit -to and from the kernel function. +This test fails for numCPUs > 1, because a local variable defined in a kernel +is being allocated by clang into global memory, rather than thread local +storage. -After creating a simplified test case using lli, was able to reproduce the error -and fix the issue by modifying the intermediate IR of the test case. +For this OpenCL code: -However, the same modifications translated into shamrock did not resolve the issue there. + __kernel void test( __global unsigned int * input, __global unsigned int *outMaxes ) { + __local unsigned int localStorage[256*4]; + [...] + } -This issue may be the cause of many of the other basic test failures which -involve vector parameters used in JIT'd ARM kernels. + The LLVM IR produced is: -These other tests fail due to unexpected results being returned from the JIT'ed -kernels on ARM: + @test.localStorage = internal unnamed_addr addrspace(2) global [256 x i32] zeroinitializer, align 4 -local_kernel_scope -explicit_s2v_<type> -fpmath_float4 -intmath_int4 -intmath_long2 - -TODO: -===== -kernel_memory_alignment_local - clSetKenrelArg failed. -vload_local - clSetKernelArg failed -vstore_local - clSetKernelArg failed -local_arg_def - clCreateBuffer failed. +The expectation is that clang would have generated a thread_local attribute +on OpenCL __local variables, allowing the MCJIT/ARM backend to allocate +the variable localStorage into TLS at runtime. |