aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGil Pitney <gil.pitney@linaro.org>2014-12-16 10:34:29 -0800
committerGil Pitney <gil.pitney@linaro.org>2014-12-16 10:48:40 -0800
commiteaed9770913a70f23870fd92448cc8026b48a0a6 (patch)
tree0db733fcb324f6cc1977345bc724a099e28d71c7
parent4c383810169fe14b87a56d022b61f5155154f2e6 (diff)
downloadshamrock-Khronos_conformance.tar.gz
v0.9: Shamrock README and TODO listsshamrock_v0.9Khronos_conformance
and updated CREDITS Signed-off-by: Gil Pitney <gil.pitney@linaro.org>
-rw-r--r--CREDITS19
-rw-r--r--README (renamed from README.txt)64
-rw-r--r--TODO61
-rw-r--r--src/builtins/README.txt3
-rw-r--r--tests/basic_test_failures.lst323
5 files changed, 132 insertions, 338 deletions
diff --git a/CREDITS b/CREDITS
index c027de3..df199c0 100644
--- a/CREDITS
+++ b/CREDITS
@@ -1,7 +1,14 @@
-Denis Steckelmacher
-TI OpenCL team
-Tom Gall
-Gil Pitney
-Show Liu
-Jia Jia
+This software's geneology:
+ - Denis Steckelmacher: http://people.freedesktop.org/~steckdenis/clover
+ - TI OpenCL team: http://git.ti.com/opencl
+With some inclusions derived from:
+ - http://libclc.llvm.org
+ - pocl: http://portablecl.org
+Contributors at Linaro include:
+ - Tom Gall
+ - Gil Pitney
+ - Show Liu
+ - Jia Jia
+... and special thanks to
+ - Renato Golin (for help with clang/LLVM/ARM)
diff --git a/README.txt b/README
index 160fb95..6105f55 100644
--- a/README.txt
+++ b/README
@@ -1,4 +1,4 @@
-Shamrock: an OpenCL implementation based on clover
+Shamrock: an OpenCL 1.1 implementation based on clover
This is a continuation of the clover OpenCL project:
http://people.freedesktop.org/~steckdenis/clover
@@ -29,8 +29,8 @@ LLVM Configuration:
This was tested using LLVM 3.5.0 stable release from:
http://llvm.org/releases/download.html
-Note: LLVM must be configured and built with certain options to link with shamrock for
-ARM.
+Note: LLVM must be configured and built with certain options to link with
+shamrock for ARM.
The following creates a release build for ARM, with LLVM installed
into /opt/llvm:
@@ -47,26 +47,22 @@ Shamrock Build:
Current Branch: Khronos_conformance
Usage: cmake <project_src_dir> <optional_defines>*
- where <optional_defines*> are:
- -DPROJECT=shamrock | shannon | hawking
- -DLLVM_CONFIG_EXECUTABLE=<path to private llvm-config version>
-Note PROJECT=shamrock is default.
-The best way to compile is to use an out of src build, eg for a Debug build,
+If LLVM lives in a private path:
+ -DLLVM_CONFIG_EXECUTABLE=<path to private llvm-config version>
+If your Clang is installed to a different location than LLVM,
+then define CLANG_INCLUDE_DIR and CLANG_LIB_DIR on the cmake cmd line:
+ -DCLANG_INCLUDE_DIR=/opt/clang/include -DCLANG_LIB_DIR=/opt/clang/lib
+
+The best way to compile is to use an out of src build; eg for a Debug build,
and custom LLVM:
% mkdir shamrock_build
% cd shamrock_build
-% cmake -DLLVM_CONFIG_EXECUTABLE=/opt/llvm/bin/llvm-config -DCMAKE_BUILD_TYPE=Debug <path_to>/shamrock
+% cmake -DCMAKE_BUILD_TYPE=Debug <path_to>/shamrock
% make
% sudo make install
-If your Clang is installed to a different location than LLVM,
-then define CLANG_INCLUDE_DIR and CLANG_LIB_DIR on the cmake cmd line:
-
- -DCLANG_INCLUDE_DIR=/opt/clang/include -DCLANG_LIB_DIR=/opt/clang/lib
-
-
SANITY TESTS
============
@@ -81,25 +77,25 @@ Latest Results:
shamrock_build> make test
Running tests...
/usr/bin/ctest --force-new-ctest-process
-Test project /home/gpitney/shamrock_build
+Test project /home/user/shamrock_build
Start 1: platform
-1/8 Test #1: platform ......................... Passed 0.11 sec
+1/8 Test #1: platform ......................... Passed 0.02 sec
Start 2: device
-2/8 Test #2: device ........................... Passed 0.01 sec
+2/8 Test #2: device ........................... Passed 0.02 sec
Start 3: context
-3/8 Test #3: context .......................... Passed 0.01 sec
+3/8 Test #3: context .......................... Passed 0.02 sec
Start 4: commandqueue
-4/8 Test #4: commandqueue ..................... Passed 1.03 sec
+4/8 Test #4: commandqueue ..................... Passed 1.06 sec
Start 5: mem
-5/8 Test #5: mem .............................. Passed 0.01 sec
+5/8 Test #5: mem .............................. Passed 0.02 sec
Start 6: kernel
-6/8 Test #6: kernel ...........................***Failed 0.90 sec
+6/8 Test #6: kernel ........................... Passed 3.43 sec
Start 7: program
-7/8 Test #7: program .......................... Passed 2.17 sec
+7/8 Test #7: program .......................... Passed 5.21 sec
Start 8: builtins
-8/8 Test #8: builtins ......................... Passed 1.53 sec
+8/8 Test #8: builtins ......................... Passed 3.45 sec
-88% tests passed, 1 tests failed out of 8
+100% tests passed, 0 tests failed out of 8
PIGLIT TESTS
============
@@ -116,14 +112,18 @@ To run OpenCL tests, results in results/all_cl/main
% piglit run tests/all_cl results/all_cl
-DEBUGGING OpenCL Kernels:
+Debugging OpenCL Kernels:
=========================
-1. printf: A builtin function named "debug" maps to the printf symbol in the getBuiltin()
- callback function, allowing printf from OpenCL kernels. Alternatively, this mechanism
- can be used to define aribraty functions to be called back from kernels.
+1. printf: A prototype has been added to include/CL/cpu.h, allowing printf
+ to be called from OpenCL kernels.
+
+2. gdb:
+ The kernel is called from Coal::CPUKernelWorkGroup::run(), at the
+ line:
+ p_kernel_func_addr(p_args);
-2. gdb: Using the above getBuiltin() mechanism, a breakpoint can be placed in a callback
- function at kernel exit, then stepping back into the kernel via gdb, will allow
- debug of the kernel code (assembly level stepping).
+ Placing a breakpoint before this line, one can then put a breakpoint on
+ the kernel function itself, disassemble, and stepi through the
+ assembly code.
diff --git a/TODO b/TODO
index 84384e5..6a0f399 100644
--- a/TODO
+++ b/TODO
@@ -1,29 +1,64 @@
Things To Do:
-Features:
-=========
+Roadmap
+=======
+
+1. Resync with git.ti.com/opencl.
+
+ Update to latest common code from the TI repo.
+
+2. Merge TI OpenCL DSP Device support from git.ti.com/opencl.
+
+ Get the CPU and DSP device support to act as plugins to a common
+ framework, so that they may coexist.
+
+3. Validate CPU Device for ARMv8, x86_64
-1. Merge latest TI OpenCL DSP Device support
+ Ensure same tests that pass on ARM also work on x86.
-2. Update to OpenCL v 1.2
+4. Update to OpenCL v 1.2
+
+5. Move to SPIR.
+
+ Once a SPIR to ARM backend becomes available, move to SPIR as a more
+ standard IR.
Issues:
=======
-1. Khronos basic tests failures (both on ARM and x86_64), documented here:
+1. Khronos tests:
+
+ The majority of Khronos basic tests pass, proving that MCJIT is working
+ pretty well for ARM.
+
+ The remaining Khronos 'basic' test failures (ARM only), documented here:
+
+ tests/basic_test_failures.lst
-tests/basic_test_failures.lst
+ Some of the other Khronos tests behond "basic" have been run on ARM,
+ but the failures have yet to be analysed.
+ computeinfo: PASS
+ api: 55/61 PASS
+ basic: 87/95 PASS
+ buffers: 78/81 PASS
+ commonfns: 17/17 PASS
-2. Testing Full DSP Device support.
+2. Builtins:
-Requires extra TI DSP compiler, DSP side files and builtin library which are not
-currently released.
+ atomic builtins not yet implemented.
+ math builtins implemented (using libm or builtins), but not all pass the
+ Khronos criteria for accuracy.
-The build therefore defaults to SHAMROCK build type, for CPU Device only, and uses a CPU only
-builtins library. The clc.h, therefore, has not been tested with DSP Device builds.
+ The current clc.h has its roots in LLVM libclc, but has diverged. It
+ may be interesting to resync with that project, if those builtins
+ actually pass the Khronos tests.
-3. Sanity Test, one kernel test failure
+3. DSP Device support:
-See README.txt for latest results.
+ This requires extra TI DSP compiler, DSP side files and builtin library
+ support which is released by TI as part of its Keystone II MCSDK-HPC product.
+ Though the CMake files allow building for TI devices, all the DSP side files
+ are not in opensource, so the build defaults to SHAMROCK build type,
+ building for CPU Device only.
diff --git a/src/builtins/README.txt b/src/builtins/README.txt
index 5e16118..6e67eec 100644
--- a/src/builtins/README.txt
+++ b/src/builtins/README.txt
@@ -1,8 +1,7 @@
This directory (builtins) is intended to supercede src/runtime as a means
to provide a builtins library for OpenCL kernels.
-Note: some of the files here do not compile due to an address space casting
-error, and are suffixed *.cl.broken.
+Note: some of the files here do not compile yet, and are suffixed *.cl.broken.
Files here were imported from the TI opencl_builtins private repository and
repurposed for CPU device (from DSP device).
diff --git a/tests/basic_test_failures.lst b/tests/basic_test_failures.lst
index d5e4871..a0fff9c 100644
--- a/tests/basic_test_failures.lst
+++ b/tests/basic_test_failures.lst
@@ -12,6 +12,7 @@ Failure Mode:
Analysis:
hiloeo
+astype
======
Failure Mode:
------------
@@ -22,8 +23,17 @@ Analysis:
--------
valgrind analysis on shamrock showed huge memory leaks around creating and
deleting programs, which were due to LLVM objects not getting freed. This
-could either be a usage problem, or a bug in LLVM MCJIT execution engine.
+could either be a usage problem, or leaks in LLVM MCJIT execution engine.
+Note also in llvm-src/tools/clang/include/clang/Frontend/CompilerInstance this
+comment:
+ // FIXME: Eliminate the llvm_shutdown requirement, that should either be part
+ // of the context or else not CompilerInstance specific.
+ bool ExecuteAction(FrontendAction &Act);
+
+The Khronos tests do not call llvm_shutdown (nor should they), but also often
+do not call clReleaseProgram() after calling clCreateProgram() many times in a
+loop .
async_copy_global_to_local.txt
async_copy_local_to_global.txt
@@ -34,7 +44,7 @@ Failure Mode:
------------
All of the above 4 tests fail in the same way: Due to the Khronos generated
CL file not being able to compile. These also fail the same way on
-Keystone EVM (which doesn't use MCJIT).
+Keystone II EVM (which doesn't use MCJIT).
async_copy_global_to_local...
Testing char
@@ -65,12 +75,12 @@ Note the cast of (event_t)0 in the kernel above.
Per the discussion here: http://comments.gmane.org/gmane.comp.compilers.clang.scm/93008 , it appears the spec is vague on this point, but the Khronos
test nevertheless expects the cast to compile.
-So, it seems a clang patch for OpenCL event_t casts of zero may be required.
+It seems the Khronos test and clang are in conflict.
kernel_memory_alignment_constant.txt
====================================
Failure Mode:
-------------
+-------------
This fails due to inability to compile a Khronos test generated CL program.
@@ -86,7 +96,7 @@ program.cl:5:18: error: variable in constant address space must be initialized
program.cl:6:18: error: variable in constant address space must be initialized
program.cl:7:19: error: variable in constant address space must be initialized
-ERROR: clBuildProgram failed! (CL_BUILD_PROGRAM_FAILURE from /home/gpitney/opencl_conformance/test_common/harness/kernelHelpers.c:35)
+ERROR: clBuildProgram failed! (CL_BUILD_PROGRAM_FAILURE from /home/gpitney/open
Original source is: ------------
constant char mem0[3];
@@ -116,297 +126,40 @@ Some digging shows this clang error was added after LLVM 3.3
(LLVM version used by TI Keystone, which explains why it passes there):
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20131230/096405.html
-In this case, the LLVM clang compiler and the Khronos tests are in conflict.
-
-local_kernel_def.txt
-====================
-Failure Mode:
-------------
-This fails due to inability to compile a Khronos test generated CL program.
-
-
-local_kernel_def...
-program.cl:3:23: error: 'tmp_sum' declared as an array with a negative size
-
-ERROR: clBuildProgram failed! (CL_BUILD_PROGRAM_FAILURE from /home/gpitney/opencl_conformance/test_common/harness/kernelHelpers.c:35)
-Original source is: ------------
-__kernel void compute_sum_with_localmem(__global int *a, int n, __global int *sum)
-{
- __local int tmp_sum[-2147483648];
- int tid = get_local_id(0);
- int lsize = get_local_size(0);
- int i;
+In this case, the LLVM clang compiler and the Khronos tests seem to be
+in conflict.
-[... snip ...]
-Analysis:
---------
-This test also fails on Keystone, but the negative number is (-4).
-
-The Khronos test is casting a size_t value for work group size to an int,
-and printing it into the kernel string using the %d printf() modifier.
-This does not appear to be the right printf() modifier for a size_t, so
-the test code appears to be in error.
-
-parameter_types
-=================
+local_kernel_scope
+==================
Failure Mode:
------------
-Invalid results results returned from test-generated OCL kernel, which uses
-vector parameters of various sizes.
-
-[ ... snip ...]
-Testing vector size 4
-Kernel: __kernel void test_kernel(
-char4 c, uchar4 uc, short4 s, ushort4 us, int4 i, uint4 ui, float4 f,
-__global float4 *result)
-{
- result[0] = convert_float4(c);
- result[1] = convert_float4(uc);
- result[2] = convert_float4(s);
- result[3] = convert_float4(us);
- result[4] = convert_float4(i);
- result[5] = convert_float4(ui);
- result[6] = f;
-}
-
-Conversion from char4 failed: index 0 got 4.28107e-38, expected 0.
-Conversion from char4 failed: index 2 got 16, expected 2.
-Conversion from char4 failed: index 3 got 1, expected -3.
-Conversion from uchar4 failed: index 0 got 4.28107e-38, expected 16.
-Conversion from uchar4 failed: index 1 got -1, expected 1.
-Conversion from uchar4 failed: index 2 got 18, expected 2.
-Conversion from uchar4 failed: index 3 got 1, expected 3.
-Conversion from short4 failed: index 0 got -19, expected -17.
-Conversion from short4 failed: index 2 got 20, expected 2.
-Conversion from short4 failed: index 3 got 1, expected -3.
-Conversion from ushort4 failed: index 0 got -23, expected 18.
-Conversion from ushort4 failed: index 1 got -1, expected 1.
-Conversion from ushort4 failed: index 2 got 0, expected 2.
-Conversion from ushort4 failed: index 3 got 0, expected 3.
-Conversion from int4 failed: index 0 got 0, expected -19.
-Conversion from int4 failed: index 1 got 0, expected -1.
-Conversion from int4 failed: index 2 got 0, expected 2.
-Conversion from int4 failed: index 3 got 0, expected -3.
-Conversion from uint4 failed: index 0 got 0, expected 20.
-Conversion from uint4 failed: index 1 got 0, expected 1.
-Conversion from uint4 failed: index 2 got 0, expected 2.
-Conversion from uint4 failed: index 3 got 0, expected 3.
-Conversion from float4 failed: index 0 got 0, expected -23.
-Conversion from float4 failed: index 1 got 0, expected -1.
-Conversion from float4 failed: index 2 got 0, expected 2.
-Conversion from float4 failed: index 3 got 0, expected -3.
-Testing vector size 8
-Kernel: __kernel void test_kernel(
-char8 c, uchar8 uc, short8 s, ushort8 us, int8 i, uint8 ui, float8 f,
-__global float8 *result)
-{
- result[0] = convert_float8(c);
- result[1] = convert_float8(uc);
- result[2] = convert_float8(s);
- result[3] = convert_float8(us);
- result[4] = convert_float8(i);
- result[5] = convert_float8(ui);
- result[6] = f;
-}
-
-Conversion from char8 failed: index 0 got -5.99946e-08, expected 0.
-Conversion from char8 failed: index 2 got 16, expected 2.
-Conversion from char8 failed: index 3 got 1, expected -3.
-Conversion from char8 failed: index 4 got 4.28106e-38, expected 4.
-Conversion from char8 failed: index 5 got -1, expected -5.
-Conversion from char8 failed: index 6 got 18, expected 6.
-Conversion from char8 failed: index 7 got 1, expected -7.
-Conversion from uchar8 failed: index 0 got -19, expected 16.
-Conversion from uchar8 failed: index 1 got -1, expected 1.
-Conversion from uchar8 failed: index 2 got 20, expected 2.
-Conversion from uchar8 failed: index 3 got 1, expected 3.
-Conversion from uchar8 failed: index 4 got -5.99946e-08, expected 4.
-Conversion from uchar8 failed: index 5 got -1, expected 5.
-Conversion from uchar8 failed: index 6 got 0, expected 6.
-Conversion from uchar8 failed: index 7 got 0, expected 7.
-Conversion from short8 failed: index 0 got 0, expected -17.
-Conversion from short8 failed: index 1 got 0, expected -1.
-Conversion from short8 failed: index 2 got 0, expected 2.
-Conversion from short8 failed: index 3 got 0, expected -3.
-Conversion from short8 failed: index 4 got 0, expected 4.
-Conversion from short8 failed: index 5 got 0, expected -5.
-Conversion from short8 failed: index 6 got 0, expected 6.
-Conversion from short8 failed: index 7 got 0, expected -7.
-Conversion from ushort8 failed: index 0 got 0, expected 18.
-Conversion from ushort8 failed: index 1 got 0, expected 1.
-Conversion from ushort8 failed: index 2 got 0, expected 2.
-Conversion from ushort8 failed: index 3 got 0, expected 3.
-Conversion from ushort8 failed: index 4 got 0, expected 4.
-Conversion from ushort8 failed: index 5 got 0, expected 5.
-Conversion from ushort8 failed: index 6 got 0, expected 6.
-Conversion from ushort8 failed: index 7 got 0, expected 7.
-Conversion from int8 failed: index 0 got 0, expected -19.
-Conversion from int8 failed: index 1 got 0, expected -1.
-Conversion from int8 failed: index 2 got 0, expected 2.
-Conversion from int8 failed: index 3 got 0, expected -3.
-Conversion from int8 failed: index 4 got 0, expected 4.
-Conversion from int8 failed: index 5 got 0, expected -5.
-Conversion from int8 failed: index 6 got 0, expected 6.
-Conversion from int8 failed: index 7 got 0, expected -7.
-Conversion from uint8 failed: index 0 got 0, expected 20.
-Conversion from uint8 failed: index 1 got 0, expected 1.
-Conversion from uint8 failed: index 2 got 0, expected 2.
-Conversion from uint8 failed: index 3 got 0, expected 3.
-Conversion from uint8 failed: index 4 got 0, expected 4.
-Conversion from uint8 failed: index 5 got 0, expected 5.
-Conversion from uint8 failed: index 6 got 0, expected 6.
-Conversion from uint8 failed: index 7 got 0, expected 7.
-Conversion from float8 failed: index 0 got 0, expected -23.
-Conversion from float8 failed: index 1 got 0, expected -1.
-Conversion from float8 failed: index 2 got 0, expected 2.
-Conversion from float8 failed: index 3 got 0, expected -3.
-Conversion from float8 failed: index 4 got 0, expected 4.
-Conversion from float8 failed: index 5 got 0, expected -5.
-Conversion from float8 failed: index 6 got 0, expected 6.
-Conversion from float8 failed: index 7 got 0, expected -7.
-Testing vector size 16
-Kernel: __kernel void test_kernel(
-char16 c, uchar16 uc, short16 s, ushort16 us, int16 i, uint16 ui, float16 f,
-__global float16 *result)
-{
- result[0] = convert_float16(c);
- result[1] = convert_float16(uc);
- result[2] = convert_float16(s);
- result[3] = convert_float16(us);
- result[4] = convert_float16(i);
- result[5] = convert_float16(ui);
- result[6] = f;
-}
+The max of a set of unsigned integers computed from an OCL kernel differs
+from the max of the same set computed by the host.
-Conversion from char16 failed: index 0 got -7.22404e-06, expected 0.
-Conversion from char16 failed: index 2 got 16, expected 2.
-Conversion from char16 failed: index 3 got 1, expected -3.
-Conversion from char16 failed: index 4 got -3.96717e-07, expected 4.
-Conversion from char16 failed: index 5 got -1, expected -5.
-Conversion from char16 failed: index 6 got 18, expected 6.
-Conversion from char16 failed: index 7 got 1, expected -7.
-Conversion from char16 failed: index 8 got 0, expected 8.
-Conversion from char16 failed: index 9 got -1, expected -9.
-Conversion from char16 failed: index 10 got 20, expected 10.
-Conversion from char16 failed: index 11 got 1, expected -11.
-Conversion from char16 failed: index 12 got 4.28106e-38, expected 12.
-Conversion from char16 failed: index 13 got -1, expected -13.
-Conversion from char16 failed: index 14 got 0, expected 14.
-Conversion from char16 failed: index 15 got 0, expected -15.
-Conversion from uchar16 failed: index 0 got 0, expected 16.
-Conversion from uchar16 failed: index 1 got 0, expected 1.
-Conversion from uchar16 failed: index 2 got 0, expected 2.
-Conversion from uchar16 failed: index 3 got 0, expected 3.
-Conversion from uchar16 failed: index 4 got 0, expected 4.
-Conversion from uchar16 failed: index 5 got 0, expected 5.
-Conversion from uchar16 failed: index 6 got 0, expected 6.
-Conversion from uchar16 failed: index 7 got 0, expected 7.
-Conversion from uchar16 failed: index 8 got 0, expected 8.
-Conversion from uchar16 failed: index 9 got 0, expected 9.
-Conversion from uchar16 failed: index 10 got 0, expected 10.
-Conversion from uchar16 failed: index 11 got 0, expected 11.
-Conversion from uchar16 failed: index 12 got -3.96712e-07, expected 12.
-Conversion from uchar16 failed: index 13 got 0, expected 13.
-Conversion from uchar16 failed: index 14 got 0, expected 14.
-Conversion from uchar16 failed: index 15 got 0, expected 15.
-Conversion from short16 failed: index 0 got 0, expected -17.
-Conversion from short16 failed: index 1 got 0, expected -1.
-Conversion from short16 failed: index 2 got 0, expected 2.
-Conversion from short16 failed: index 3 got 0, expected -3.
-Conversion from short16 failed: index 4 got 0, expected 4.
-Conversion from short16 failed: index 5 got 0, expected -5.
-Conversion from short16 failed: index 6 got 0, expected 6.
-Conversion from short16 failed: index 7 got 0, expected -7.
-Conversion from short16 failed: index 8 got 0, expected 8.
-Conversion from short16 failed: index 9 got 0, expected -9.
-Conversion from short16 failed: index 10 got 0, expected 10.
-Conversion from short16 failed: index 11 got 0, expected -11.
-Conversion from short16 failed: index 12 got 0, expected 12.
-Conversion from short16 failed: index 13 got 0, expected -13.
-Conversion from short16 failed: index 14 got 0, expected 14.
-Conversion from short16 failed: index 15 got 0, expected -15.
-Conversion from ushort16 failed: index 0 got 0, expected 18.
-Conversion from ushort16 failed: index 1 got 0, expected 1.
-Conversion from ushort16 failed: index 2 got 0, expected 2.
-Conversion from ushort16 failed: index 3 got 0, expected 3.
-Conversion from ushort16 failed: index 4 got 0, expected 4.
-Conversion from ushort16 failed: index 5 got 0, expected 5.
-Conversion from ushort16 failed: index 6 got 0, expected 6.
-Conversion from ushort16 failed: index 7 got 0, expected 7.
-Conversion from ushort16 failed: index 8 got 0, expected 8.
-Conversion from ushort16 failed: index 9 got 0, expected 9.
-Conversion from ushort16 failed: index 10 got 0, expected 10.
-Conversion from ushort16 failed: index 11 got 0, expected 11.
-Conversion from ushort16 failed: index 12 got 0, expected 12.
-Conversion from ushort16 failed: index 13 got 0, expected 13.
-Conversion from ushort16 failed: index 14 got 0, expected 14.
-Conversion from ushort16 failed: index 15 got 0, expected 15.
-Conversion from int16 failed: index 0 got 0, expected -19.
-Conversion from int16 failed: index 1 got 0, expected -1.
-Conversion from int16 failed: index 2 got 0, expected 2.
-Conversion from int16 failed: index 3 got 0, expected -3.
-Conversion from int16 failed: index 4 got 0, expected 4.
-Conversion from int16 failed: index 5 got 0, expected -5.
-Conversion from int16 failed: index 6 got 0, expected 6.
-Conversion from int16 failed: index 7 got 0, expected -7.
-Conversion from int16 failed: index 8 got 0, expected 8.
-Conversion from int16 failed: index 9 got 0, expected -9.
-Conversion from int16 failed: index 10 got 0, expected 10.
-Conversion from int16 failed: index 11 got 0, expected -11.
-Conversion from int16 failed: index 12 got 0, expected 12.
-Conversion from int16 failed: index 13 got 0, expected -13.
-Conversion from int16 failed: index 14 got 0, expected 14.
-Conversion from int16 failed: index 15 got 0, expected -15.
-Conversion from uint16 failed: index 0 got 0, expected 20.
-Conversion from uint16 failed: index 1 got 0, expected 1.
-Conversion from uint16 failed: index 2 got 0, expected 2.
-Conversion from uint16 failed: index 3 got 0, expected 3.
-Conversion from uint16 failed: index 4 got 0, expected 4.
-Conversion from uint16 failed: index 5 got 0, expected 5.
-Conversion from uint16 failed: index 6 got 0, expected 6.
-Conversion from uint16 failed: index 7 got 0, expected 7.
-Conversion from uint16 failed: index 8 got 0, expected 8.
-Conversion from uint16 failed: index 9 got 0, expected 9.
-Conversion from uint16 failed: index 10 got 0, expected 10.
-Conversion from uint16 failed: index 11 got 0, expected 11.
-Conversion from uint16 failed: index 12 got 0, expected 12.
-Conversion from uint16 failed: index 13 got 0, expected 13.
-Conversion from uint16 failed: index 14 got 0, expected 14.
-Conversion from uint16 failed: index 15 got 0, expected 15.
-
-[ then crashes on next sub-tests]
+Error Message:
+local_kernel_scope...
+ Testing with 6 groups, 184 elements per group...
+ERROR: Local max validation failed! (expected 4274779084, got 4290015211 for i=0)
Analysis:
--------
-Some rather intense debugging found the culprit being the float<n> *
-results output kernel vector argument was being *modified* by the MCJIT
-generated ARM assembly kernel code! This was determined by gdb debugging
-via assembly into the JIT'd kernel, and also inserting callbacks to
-builtin funcitons to inspect the results pointer argument at entry and exit
-to and from the kernel function.
+This test fails for numCPUs > 1, because a local variable defined in a kernel
+is being allocated by clang into global memory, rather than thread local
+storage.
-After creating a simplified test case using lli, was able to reproduce the error
-and fix the issue by modifying the intermediate IR of the test case.
+For this OpenCL code:
-However, the same modifications translated into shamrock did not resolve the issue there.
+ __kernel void test( __global unsigned int * input, __global unsigned int *outMaxes ) {
+ __local unsigned int localStorage[256*4];
+ [...]
+ }
-This issue may be the cause of many of the other basic test failures which
-involve vector parameters used in JIT'd ARM kernels.
+ The LLVM IR produced is:
-These other tests fail due to unexpected results being returned from the JIT'ed
-kernels on ARM:
+ @test.localStorage = internal unnamed_addr addrspace(2) global [256 x i32] zeroinitializer, align 4
-local_kernel_scope
-explicit_s2v_<type>
-fpmath_float4
-intmath_int4
-intmath_long2
-
-TODO:
-=====
-kernel_memory_alignment_local - clSetKenrelArg failed.
-vload_local - clSetKernelArg failed
-vstore_local - clSetKernelArg failed
-local_arg_def - clCreateBuffer failed.
+The expectation is that clang would have generated a thread_local attribute
+on OpenCL __local variables, allowing the MCJIT/ARM backend to allocate
+the variable localStorage into TLS at runtime.