1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
|
*-----------------------------------------------------------------------------
Open CL(TM) 1.1 Product Version 0.4.0, from Texas Instruments, Inc.
*-----------------------------------------------------------------------------
*-----------------------------------------------------------------------------
* INSTALLATION
*-----------------------------------------------------------------------------
1) This installation is not setup to coexist with other versions of this
product. This is due to environment variables that point into the
installation. Therefore, you should uninstall all previous versions of the
TI OpenCL product before installation of this version.
2) The installation program modifies your shell's .rc file (e.g. .bashrc,
.tcshrc, .cshrc, etc) to create or append to three environment variables,
PATH, LD_LIBRARY_PATH, TI_OCL_INSTALL. You will need to re-source that .rc
file for the changes to take effect.
3) The OpenCL product or some of the examples in the product are dependent on
Ubuntu packages that are not typically installed by default. This step will
install these packages and will require sudo privileges or an administrator
to execute. Execute the following commands:
sudo apt-get install libpciaccess-dev binutils-dev ia32-libs libsdl1.2-dev
sudo apt-get install mesa-common-dev
Note 1) The OpenCL product is dependent on a kernel module that allows for
contiguous memory allocation on the Linux host. Kernel modules can be
specific to the Linux kernel version you are running. This package
contains the source for the module and is custom built for your linux
version as part of the installation process. The install (and uninstall)
package does require sudo privileges for portions of the install process
and will request an administrator password. The scripts
$TI_OCL_INSTALL/scripts/install.sh and $TI_OCL_INSTALL/scripts/uninstall.sh
can be inspected for details on the commands that are run and require sudo
privilege.
*-----------------------------------------------------------------------------
* SUMMARY OF DELTAS 0.1.6 from 0.1.5
*-----------------------------------------------------------------------------
*-----------------------------------------------------------------------------
* SUMMARY OF DELTAS 0.1.5 from 0.1.4
*-----------------------------------------------------------------------------
- More reliable installation and uninstallation of the cmem module
- Updated the C66 compiler tools to be based on version 7.5.0A13072
*-----------------------------------------------------------------------------
* SUMMARY OF DELTAS 0.1.4 from 0.1.3
*-----------------------------------------------------------------------------
- Updated to use the TI Desktop Linux SDK version 01.00.00.07
- Updated internal LLVM usage from version 3.0 to 3.2 libraries
- More accurate handling of the DSPC8682
- Increased the OpenCL global buffer area from 992M to 1023M
- Increased the OpenCL local buffer area from 128K to 256K
- The default speed of the DSP was modified from 1.25 Ghz to 1.00 Ghz.
This was due to the fact that most of the devices on the Advantech cards
are qualified for only 1.00 Ghz and some instability was seen running at
1.25 Ghz. See below for an environment variable you can set that will
change the DSP speed back to 1.25 GHz.
- Added logic to reset certain persistent configurations of the DSP device
that could cause incorrect behavior when intermixing the run of an opencl
application with a non opencl application using the DSP devices in
a conflicting manner.
- General bug fixes and stability improvements.
*-----------------------------------------------------------------------------
* SUMMARY OF DELTAS 0.1.3 from 0.1.2
*-----------------------------------------------------------------------------
- Updated to use all DSPs found in the PCIe subsystem. Previously it was
fixed to 4 dsps. This should allow multiple cards to be discovered and
used. It should also allow all 8 dsps on an octal card to be found.
Note: The octal card setup had not been tested.
- Updated the mandelbrot demo to use an image size of 720 instead of 500. 720
was chosen because it is divisible by 4, 5, 8, 9, and 16 which allows for a
simple division of labor across a number of configurations. It was also
updated to print the names of all devices being used for pixel generation.
*-----------------------------------------------------------------------------
* SUMMARY OF DELTAS 0.1.2 from 0.1.1
*-----------------------------------------------------------------------------
- The DSP compiler would sometimes fail to compile OpenCL C code that included
vector types, ie float2, int3, etc. This bug has been resolved.
- Certain OpenCL C kernels would cause a segmentation fault in the dynamic
loader in the OpenCL library. The gdb stack dump would list the function
process_rela_table.isra.7 as the faulting function. This bug has been
resolved.
- Version 0.1.1 would not allow local (__local) address qualified variables to
be defined in function scope. This bug has been resolved.
- The OpenCL C as_<type> builtins have been added.
- Version 0.1.2 will introduce a 32bit version of the library.
*-----------------------------------------------------------------------------
* SUMMARY OF DELTAS 0.1.1 from 0.1.0
*-----------------------------------------------------------------------------
- Stability improvements.
- Install and uninstall improvements. The product is no longer dependent on a
specific linux kernel version. It does still depend on a kernel module, but
the source for that kernel module is shipped with the installation packages
and is made on the users' machine.
- Added example simple.
*-----------------------------------------------------------------------------
* PRODUCT DESCRIPTION
*-----------------------------------------------------------------------------
This product is an OpenCL 1.1 implementation. The OpenCL specification
defines a platform model with a HOST and COMPUTE DEVICES. For this
implementation the HOST is a 64-bit x86 Linux machine and the COMPUTE DEVICES
are 4 Texas Instruments' TMS320C6678 DSP's resident on a PCIe card installed
in the Linux machine. The x86_64 cpu is also exposed as a fifth compute device
in this implementation.
*-----------------------------------------------------------------------------
* HARDWARE AND OS REQUIREMENTS
*-----------------------------------------------------------------------------
- Ubuntu 12.04 installation running on an x86 machine.
- An installed Advantech DSPC8681 quad DSP PCIe card configured for little
endian operation.
*-----------------------------------------------------------------------------
* OPENCL DOCUMENTATION
*-----------------------------------------------------------------------------
The OpenCL 1.1 specification and the 1.1 C++ bindings specification from
Khronos are included in $(TI_OCL_INSTALL)/doc.
Additional OpenCL resources can be found on the web. Some links are provided
below.
The OpenCL 1.1 on-line manual pages can be found at:
http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/
The following page contains links to other OpenCL resources, including books
that may be helpful to you:
http://www.khronos.org/opencl/resources
*-----------------------------------------------------------------------------
* LIMITATIONS
*-----------------------------------------------------------------------------
- This is an early alpha version of this product. It is complete enough to be
useful under some circumstances and as such we would like to get feedback
from a select number of early adopters. However, it is by no means a
complete or compliant implementation. Taking an existing OpenCL application
and linking it against this implementation will not likely produce expected
results. Some of the major incomplete areas of the implementation are listed
below. The below list is not a complete list of limitations.
- OpenCL C is not yet fully supported. In particular,
- Kernel arguments less than 32 bits in size cannot be passed to kernels.
- There is a limit of 10 arguments that may be passed to kernels.
- Structures may not be passed as arguments to kernels.
- Only a few OpenCL C built-in functions are supported.
- The math fcns that are also part of the std C library are supported.
- The work group identification functions are supported, i.e.
get_global_id(), get_local_id(), get global_size(), etc.
- The barrier and synchronization functions are not supported.
- OpenCL Out of order Queues (OOOQs) are not yet supported. OOOQs allow
enqueued kernels to be serviced before a prior kernel is completed. This
behavior is particularly beneficial if you enqueue tasks rather than
NDRangekernels. When OOOQs are supported up to 8 enqueued tasks can be in
flight simultaneously per DSP device. OOOQs will also allow overlapped I/O
and compute operations allowing a double buffering or pipeline operation.
For this alpha, only one operation at a time is active within a Queue.
- An OpenCL ICD (Installable Client Driver) is provided with this product, but
it will not discover the TI OpenCL implementation. The OpenCL ICD is a
standard OpenCL library that will discover all installed OpenCL
implementations on a system and will allow the application to choose a
platform and dispatch through that platform's implementation. The TI OpenCL
implementation is not yet ICD compatible and therefore will not be
discovered. The ICD library can however be used to discover and dispatch to
other vendor implementations.
- The clEnqueueCreateBuffer flags CL_MEM_USE_HOST_MEMORY,
CL_MEM_ALLOC_HOST_MEMOY, CL_MEM_COPY_HOST_MEMORY are not yet implemented and
will simply be ignored.
- The OpenCL clEnqueueMapBuffer and clEnqueueUnmapBuffer operations not yet
supported.
- OpenCL Images and Samplers are optional features for non GPU devices and are
not supported for the DSP devices.
- The OpenCL api allows for either on-line or off-line compilation of OpenCL C
kernels. This release only supports the on-line compilation mode for OpenCL C
code. As a result, clCreateProgramFromBinaries is not supported yet, nor is
querying OpenCL for the binaries associated with a Program object.
- Even though off-line compilation for OpenCL C code is not yet
supported, OpenCL C code can call standard C code functions and the
standard C code functions can be compiled off-line. An example
illustrating this flow is included in the examples sub-directory.
The standard C code functions that are called should not include
code that: resets the device, allocates memory blocks that may
conflict with the OpenCL runtime, change the cache configuration,
etc. OpenCL C code calling C++ code is not supported.
- Also, compilations of OpenCL C code are cached on the system. If you
run an OpenCL application that on-line compiles some OpenCL C code,
the resultant binaries are cached on the system and the next time
you run the opencl application, the compilation step is skipped and
the cached binaries are used. The caching only uses the OpenCL C
code and the compile options as a hash, so an example where the
OpenCL C code is calling a C function in a linked object file or
library and the object file or library is modified will result in an
execution of the OpenCL C linked against the older version of the
object. In this case you will need to clear the OpenCL C compile
cache, which can be accomplished with the command
"rm -f /tmp/opencl*".
*-----------------------------------------------------------------------------
* EXAMPLE OPERATION
*-----------------------------------------------------------------------------
There are several OpenCL examples shipped with the product. I'll explain the
motivation behind each and the steps needed for execution.
IMPORTANT NOTE: For any of these examples or any OpenCL code you write,
execution of the code will sometimes appear to hang. This is due to a known
issue in the first communication between the Host and the DSP. It occurs
intermittently and will be fixed in later releases. There is a decription in
the LIMITATIONS section of this readme describing workarounds for this problem.
PLATFORM EXAMPLE
----------------
The platform example uses the OpenCL C++ bindings to discover key platform and
device information from the OpenCL implementation and print it to the screen.
To print the information from the TI OpenCL implementation:
1. cd $TI_OCL_INSTALL/examples/platform
2. make
3. ./platform
To print the information from the Any other vendors OpenCL implementation
installed on the system:
1. cd $TI_OCL_INSTALL/examples/platform
2. make icd
3. ./platform_icd
The Makefile in this example directory also illustrates the difference between
linking for the TI implementation of OpenCL and the ICD.
SIMPLE EXAMPLE
-------------
This example simply illustrates the minimum steps needed to dispatch a kernel
to one DSP device and read a buffer of data back.
To run this example:
1. cd $TI_OCL_INSTALL/examples/simple
2. make
3. ./simple
MANDELBROT EXAMPLE
------------------
The mandelbrot example is a nicely visual OpenCL demo that uses OpenCL to
generate the pixels of a mandelbrot set image. This example also use the C++
OpenCL binding. The OpenCL kernels are repeatedly called generating images that are zoomed in from the previous image. This repeats until the zoom factor
reaches 1E15 or essentially the resolution of a double floating point value.
This example illustrates several key OpenCL features:
- It illustrates 4 OpenCL Q's tied to each of the 4 DSPs and a dispatch
structure that allows the 4 DSPs to cooperatively generate pixel data.
- It also illustrates the event wait feature of OpenCL.
- It illustrates the division of one time setup of OpenCL to the repetitive
enqueuing of kernels.
- It also illustrates the ease in which kernels can be shifted from one
device type to another.
To run this demo:
1. cd $TI_OCL_INSTALL/examples/mandelbrot
2. make
3. ./mandelbrot dsp
4. ./mandelbrot cpu
5. ./mandelbrot all
Step 3 will run the pixel generating kernels on the DSPs.
Step 4 will run the pixel generating kernels on all the CPU cores in the
system.
Step 5 will use both the DSPs and the CPU cores to generate the pixels.
The makefile in this example is also ICD enabled. You can
1. cd $TI_OCL_INSTALL/examples/mandelbrot
2. make icd
3. ./mandelbrot intel "If an Intel OpenCL implementation exists"
4. ./mandelbrot nvidia "If an Nvidia OpenCL implementation exists"
CCODE EXAMPLE
-------------
This example illustrates the TI extension to OpenCL that allows OpenCL C code
to call standard C code that has been compiled off-line into an object file or
static library. This mechanism can be used to allow optimized C or C callable
assembly routines to be called from OpenCL C code. It can also be used to
essentially dispatch a standard C function, by wrapping it with an OpenCL C
wrapper. Calling C++ routines from OpenCL C is not yet supported. You should
also ensure that the Standard C function and the call tree resulting from the
standard C function do not allocate device memory, change the cache structure,
or use any resources already being used by the OpenCL runtime.
To run this example:
1. cd $TI_OCL_INSTALL/examples/ccode
2. make
3. ./ccode
*-----------------------------------------------------------------------------
* ENVIRONMENT VARIABLES
*-----------------------------------------------------------------------------
TI_OCL_DSP_1_25GHZ: If this environment variable is set, then the DSPs will be
configured to run at 1.25Ghz instead of the standard 1.00
Ghz.
TI_OCL_KEEP_FILES: When OpenCL C kernels are compiled for DSPs, they are
compiled to a binary .out file in the /tmp sub-directory.
They are then subsequently available for download to the
DSPs for running. The process of compiling generates
several intermediate files for each source file. The
OpenCL typically removes these temporary files. However,
it can sometimes be useful to inspect these files. This
environment variable can be set to instruct the runtime to
leave the temporary files in /tmp. This can be useful to
inspect the assembly file associated with the out file, to
see how well your code was optimized.
TI_OCL_DEBUG_KERNEL: The TI IDE and debugger Code Composer Studio (CCS) is not
required for running OpenCL applications with this
product, but if you do have CCS installed and and emulator
connected to you PCIe card, you can set this environment
variable to enable assembly statement level debug of you
kernel. When set, this environment variable will instruct
the OpenCL runtime to pause before dispatch of a kernel.
While paused the runtime will display data to the user
indicating that a kernel dispatch is pending. It will
instruct the user to connect to the board through an
emulator and will display the appropriate breakpoint
address to used for the start of the kernel code. Having
CCS and the emulator insert itself into a running OpenCL
application can cause instability in the system in this
release and may require a power cycle to the board. Debug
capability has not been a focus for this alpha release and
will definitely improve in later releases. Setting up the
emulator and CCS is outside the scope of this readme. If
you do have those products, consult the documentation
specific to those products.
*-----------------------------------------------------------------------------
* NOTICES
*-----------------------------------------------------------------------------
* Product is based on a published Khronos Specification, and is expected to
pass the Khronos Conformance Testing Process. Current conformance status can
be found at www.khronos.org/conformance.
|