Khronos Basic Test Failures =========================== Khronos Test Version: OpenCL 1.1: April 4, 2010. Usage: % cd opencl_conformance/test_conformance/basic % test_basic : Failure Mode: Analysis: hiloeo astype ====== Failure Mode: ------------ Runs out of system memory, and crashes the test. However, the test is passing all of the subtests before it crashes. Analysis: -------- valgrind analysis on shamrock showed huge memory leaks around creating and deleting programs, which were due to LLVM objects not getting freed. This could either be a usage problem, or leaks in LLVM MCJIT execution engine. Note also in llvm-src/tools/clang/include/clang/Frontend/CompilerInstance this comment: // FIXME: Eliminate the llvm_shutdown requirement, that should either be part // of the context or else not CompilerInstance specific. bool ExecuteAction(FrontendAction &Act); The Khronos tests do not call llvm_shutdown (nor should they), but also often do not call clReleaseProgram() after calling clCreateProgram() many times in a loop . async_copy_global_to_local.txt async_copy_local_to_global.txt async_strided_copy_global_to_local.txt async_strided_copy_local_to_global.txt ====================================== Failure Mode: ------------ All of the above 4 tests fail in the same way: Due to the Khronos generated CL file not being able to compile. These also fail the same way on Keystone II EVM (which doesn't use MCJIT). async_copy_global_to_local... Testing char program.cl:9:153: error: used type 'event_t' where arithmetic or pointer type is required ERROR: clBuildProgram failed! (CL_BUILD_PROGRAM_FAILURE from /home/gpitney/opencl_conformance/test_common/harness/kernelHelpers.c:35) Original source is: ------------ __kernel void test_fn( const __global char *src, __global char *dst, __local char *localBuffer, int copiesPerWorkgroup, int copiesPerWorkItem ) { int i; for(i=0; i 1, because a local variable defined in a kernel is being allocated by clang into global memory, rather than thread local storage. For this OpenCL code: __kernel void test( __global unsigned int * input, __global unsigned int *outMaxes ) { __local unsigned int localStorage[256*4]; [...] } The LLVM IR produced is: @test.localStorage = internal unnamed_addr addrspace(2) global [256 x i32] zeroinitializer, align 4 The expectation is that clang would have generated a thread_local attribute on OpenCL __local variables, allowing the MCJIT/ARM backend to allocate the variable localStorage into TLS at runtime.