aboutsummaryrefslogtreecommitdiff
path: root/tests/basic_test_failures.lst
blob: 65eafb8834df6c642cbed816816e21c9cf01ef0f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
Khronos Basic Test Failures
===========================
Khronos Test Version: OpenCL 1.2

Usage:  
% cd opencl_conformance/test_conformance/basic
% test_basic <test-name>


<test-name>: 
Failure Mode:
Analysis:

hiloeo
astype
prefetch
======
Failure Mode:   
------------
Runs out of system memory, and crashes the test.  
However, the test is passing all of the subtests before it crashes.

Analysis:
--------
valgrind analysis on shamrock showed huge memory leaks around creating and
deleting programs, which were due to LLVM objects not getting freed.  This
could either be a usage problem, or leaks in LLVM MCJIT execution engine.

Note also in llvm-src/tools/clang/include/clang/Frontend/CompilerInstance this
comment:
   // FIXME: Eliminate the llvm_shutdown requirement, that should either be part
   // of the context or else not CompilerInstance specific.
   bool ExecuteAction(FrontendAction &Act);

The Khronos tests do not call llvm_shutdown (nor should they), but also often
do not call clReleaseProgram() after calling clCreateProgram() many times in a
loop .

async_copy_global_to_local.txt
async_copy_local_to_global.txt
async_strided_copy_global_to_local.txt
async_strided_copy_local_to_global.txt
======================================
Failure Mode:   
------------
All of the above 4 tests fail in the same way:  Due to the Khronos generated
CL file not being able to compile.  These also fail the same way on 
Keystone II EVM (which doesn't use MCJIT).

async_copy_global_to_local...
Testing char
program.cl:9:153: error: used type 'event_t' where arithmetic or pointer type is required

ERROR: clBuildProgram failed! (CL_BUILD_PROGRAM_FAILURE from /home/gpitney/opencl_conformance/test_common/harness/kernelHelpers.c:35)
Original source is: ------------

__kernel void test_fn( const __global char *src, __global char *dst, __local char *localBuffer, int copiesPerWorkgroup, int copiesPerWorkItem )
{
 int i;
 for(i=0; i<copiesPerWorkItem; i++)
	 localBuffer[ get_local_id( 0 )*copiesPerWorkItem+i ] = (char)(char)0;
	barrier( CLK_LOCAL_MEM_FENCE );
	event_t event;
	event = async_work_group_copy( (__local char*)localBuffer, (__global const char*)(src+copiesPerWorkgroup*get_group_id(0)), (size_t)copiesPerWorkgroup, (event_t)0 );
	wait_group_events( 1, &event );
 for(i=0; i<copiesPerWorkItem; i++)
  dst[ get_global_id( 0 )*copiesPerWorkItem+i ] = localBuffer[ get_local_id( 0 )*copiesPerWorkItem+i ];
}
Build not successful for device "Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz", status: CL_BUILD_ERROR
Build log for device "Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz" is: ------------
program.cl:9:153: error: used type 'event_t' where arithmetic or pointer type is required

Analysis:
--------
Note the cast of (event_t)0 in the kernel above.
Per the discussion here: http://comments.gmane.org/gmane.comp.compilers.clang.scm/93008 , it appears the spec is vague on this point, but the Khronos 
test nevertheless expects the cast to compile.   

It seems the Khronos test and clang are in conflict.

local_kernel_scope
==================
Failure Mode:   
------------
The max of a set of unsigned integers computed from an OCL kernel differs
from the max of the same set computed by the host.

Error Message:
local_kernel_scope...
	Testing with 6 groups, 184 elements per group...
ERROR: Local max validation failed! (expected 4274779084, got 4290015211 for i=0)

Analysis:
--------
This test fails for numCPUs > 1, because a local variable defined in a kernel
is being allocated by clang into global memory, rather than thread local
storage.

For this OpenCL code:

     __kernel void test( __global unsigned int * input, __global unsigned int *outMaxes ) {
          __local unsigned int localStorage[256*4];
          [...]
           }

      The LLVM IR produced is:

             @test.localStorage = internal unnamed_addr addrspace(2) global [256 x i32] zeroinitializer, align 4


The expectation is that clang would have generated a thread_local attribute
on OpenCL __local variables, allowing the MCJIT/ARM backend to allocate
the variable localStorage into TLS at runtime.

kernel_memory_alignment_private
===============================
Failure Mode:
------------
[adding printing of the kernel from the test]
[...]
Testing short...
	Testing local kernel...
Kernel is:

kernel void test(global ulong *results)
{
   private short mem0[3];
   private short2 mem2[3];
   private short3 mem3[3];
   private short4 mem4[3];
   private short8 mem8[3];
   private short16 mem16[3];
   results[0] = (ulong)&mem0[0];
   results[1] = (ulong)&mem2[0];
   results[2] = (ulong)&mem3[0];
   results[3] = (ulong)&mem4[0];
   results[4] = (ulong)&mem8[0];
   results[5] = (ulong)&mem16[0];
}

	Vector size 16 failed: 0xb3b48b28 is not properly aligned.
[...]
[ more vector not aligned messages ensue for other types]

Analysis:
--------
Per the v1.2 spec, section 6.1.5 "Alignment of Types":

"The OpenCL compiler is responsible for aligning data items to the
appropriate alignment as required by the data type."

Dumping the module target datalayout gave the following:

   target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

I expect this is a possible issue with the LLVM backend, unless
there are some clang/OpenCL comiler options which might force alignment
different than what the LLVM target code generator is producing.