Hi! This work represents an interesting idea. That is, what would the performance improvement be using OpenCL/Renderscript to accelerate sqlite on an ARM based device. (Yes non-ARM counts too.) Goals: 0) stay compatible with SQLite APIs 1) Accelerate SQLite for the general case. a) SQL -> machine generated OpenCL b) SQL -> HSAIL / SPIR c) SQL -> Renderscript 3) Support running on Android, Linux and OSX. 4) Support use of Renderscript on Android Much of what is possible isn't tied to sqlite and could easily work for other databases or object data stores. You can find me on irc via irc.freenode.net. (Look for tgall, tgall_foo or Dr_Who) I'm usually in #linaro, #linaro-gfx as well as other channels. I'm also tom_gall on twitter and g+ too. My blog can be found at http://fullshovel.wordpress.com. I post performance fndings from time to time. ----------------------------------------------------------------------------------------- v.01 Initial git commit (and push) It is in every way an early prototype. It has bugs. It is incomplete. It does not solve general purpose problems. 13 test sql statements run and yield positive results. I'm in the midst of converting over to make more use of vectors which is yielding even more performance. Known bugs: - vectorized versions sql1rc.cl sql3rc.cl both miss two rows that they should be matching. ----------------------------------------------------------------------------------------- Build: ./build.sh to build. Note the build system sucks. And when I say sucks I really mean it. I'm basically hard locked to the Mali device drivers on a chromebook but obviously it would be really simple to adjust that for your system. This will build sq-cl and sq-cl.dbg sq-cl is nothing more than a driver for the 13 test sql statements and depending on parameters passed running an opencl implementation of one of those test statements. Run: This is REALLY rough right now and will strongly change as things become more general purposes. I needed a way to drive a sort of API design. It's ugly now in order to feel out what that might look like on the journey to address the general case. Ex: sq-cl sql1.cl 1 1 1 0 0 0 0 s The first param is the name of the currently hand coded OpenCL kernel to use. The next 7 numbers are a bitmask about what columns the query will use from the database which by the way is currently hard coded. 1, use that column. 0 don't. The next value is either s or f. Slow or Fast. It's use 64 or 128 Work Units. For sql11-sql13 there is an additional parameter due to the different type of query being used. a b c. For sql1rc.cl, sql3rc.cl and other OpenCL kernels that use the faster vector approach use the parameter d. ----------------------------------------------------------------------------------------- Where from here? - Convert the rest of the 13 tests sql statements to vectorized versions. - Convert to using column shards for everything. - Start to hook into sqlite's SQL engine. - Generate skeleton OpenCL kernels. - Determine where break even point is for shipping operations to the GPU. - Clean up, resource clean up. - Deal with 64 types properly on 64 bit systems. - Add use of the autotools or CMake. O. I take patches. Please. Seriously. ----------------------------------------------------------------------------------------- Thank you. Thanks to Peter Bakkum and Kevin Skadron for their Cuda based accelerated SQLite. Your paper http://www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_tr.pdf inspried me to attempt this. Thanks to David Rusling, Mark Orvek and the Linaro TSC for supporting this project. Thanks to Gil Pitney from TI and Show Liu from Fujitsu with whom make up the other members of the GPGPU subteam at Linaro. Cheers! Tom Graphics Working Group Tech Lead, Linaro