Kernels for performing the Reduce operation.
More...
Functions | |
| kernel void | reduce_min_f (global float4 *in, global float *out, local float *data, uint n) |
| Performs a reduce operation on the columns of an array. More... | |
| kernel void | reduce_max_ui (global uint4 *in, global uint *out, local uint *data, uint n) |
| Performs a reduce operation on the columns of an array. More... | |
Kernels for performing the Reduce operation.
| kernel void reduce_max_ui | ( | global uint4 * | in, |
| global uint * | out, | ||
| local uint * | data, | ||
| uint | n | ||
| ) |
Performs a reduce operation on the columns of an array.
Computes the maximum element for each row in an array.
N, in a row of the array should be a multiple of 4 (the data are handled as uint4). The x dimension of the global workspace, \( gXdim \), should be greater or equal to the number of elements in a row of the array divided by 8. That is, \( \ gXdim \geq N/8 \). Each work-item handles 8 uint (= 2 uint4) elements in a row of the array. The y dimension of the global workspace, \( gYdim \), should be equal to the number of rows, M, in the array. That is, \( \ gYdim = M \). The local workspace should be 1 in the y dimension, and a power of 2 in the x dimension. It is recommended to use one wavefront/warp per work-group. 0, in the output array, since in the next phase the data are going to be handled as uint4.| [in] | in | input array of uint elements. |
| [out] | out | (reduced) output array of uint elements. When the kernel is dispatched with one work-group per row, the array contains the final results, and its size should be \( rows*sizeof\ (uint) \). When the kernel is dispatched with more than one work-groups per row, the array contains the results from each block reduction, and its size should be \( wgXdim*rows*sizeof\ (uint) \). |
| [in] | data | local buffer. Its size should be 2 uint elements for each work-item in a work-group. That is \( 2*lXdim*sizeof\ (uint) \). |
| [in] | n | number of elements in a row of the array divided by 4. |
| kernel void reduce_min_f | ( | global float4 * | in, |
| global float * | out, | ||
| local float * | data, | ||
| uint | n | ||
| ) |
Performs a reduce operation on the columns of an array.
Computes the minimum element for each row in an array.
N, in a row of the array should be a multiple of 4 (the data are handled as float4). The x dimension of the global workspace, \( gXdim \), should be greater or equal to the number of elements in a row of the array divided by 8. That is, \( \ gXdim \geq N/8 \). Each work-item handles 8 float (= 2 float4) elements in a row of the array. The y dimension of the global workspace, \( gYdim \), should be equal to the number of rows, M, in the array. That is, \( \ gYdim = M \). The local workspace should be 1 in the y dimension, and a power of 2 in the x dimension. It is recommended to use one wavefront/warp per work-group. INFINITY, in the output array, since in the next phase the data are going to be handled as float4.| [in] | in | input array of float elements. |
| [out] | out | (reduced) output array of float elements. When the kernel is dispatched with one work-group per row, the array contains the final results, and its size should be \( rows*sizeof\ (float) \). When the kernel is dispatched with more than one work-groups per row, the array contains the results from each block reduction, and its size should be \( wgXdim*rows*sizeof\ (float) \). |
| [in] | data | local buffer. Its size should be 2 float elements for each work-item in a work-group. That is \( 2*lXdim*sizeof\ (float) \). |
| [in] | n | number of elements in a row of the array divided by 4. |
1.8.9.1