Research into optimization of OpenCL reduction kernels using self-tuning code. Optimization of GPU kernels is based on global memory latency and bank conflict trade-offs, and CPU kernels via vector size selection.