RandomBallCover  1.2.1
 Hosted by GitHub
Functions
reduce_kernels.cl File Reference

Kernels for performing the Reduce operation. More...

Functions

kernel void reduce_min_f (global float4 *in, global float *out, local float *data, uint n)
 Performs a reduce operation on the columns of an array. More...
 
kernel void reduce_max_ui (global uint4 *in, global uint *out, local uint *data, uint n)
 Performs a reduce operation on the columns of an array. More...
 

Detailed Description

Kernels for performing the Reduce operation.

Author
Nick Lamprianidis
Version
1.0
Date
2015
Copyright (c) 2015 Nick Lamprianidis
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Function Documentation

kernel void reduce_max_ui ( global uint4 *  in,
global uint *  out,
local uint *  data,
uint  n 
)

Performs a reduce operation on the columns of an array.

Computes the maximum element for each row in an array.

Note
When there are multiple rows in the array, a reduce operation is performed per row, in parallel.
The number of elements, N, in a row of the array should be a multiple of 4 (the data are handled as uint4). The x dimension of the global workspace, \( gXdim \), should be greater or equal to the number of elements in a row of the array divided by 8. That is, \( \ gXdim \geq N/8 \). Each work-item handles 8 uint (= 2 uint4) elements in a row of the array. The y dimension of the global workspace, \( gYdim \), should be equal to the number of rows, M, in the array. That is, \( \ gYdim = M \). The local workspace should be 1 in the y dimension, and a power of 2 in the x dimension. It is recommended to use one wavefront/warp per work-group.
When the number of elements per row of the array is small enough to be handled by a single work-group, the output array will contain the true maximums. When the elements are more than that, they are partitioned into blocks and reduced independently. In this case, the kernel outputs the maximums from each block reduction. A reduction should then be made on those maximums for the final results. The number of work-groups in the x dimension, \( wgXdim \), for the case of multiple work-groups, should be made a multiple of 4. The potential extra work-groups are used for enforcing correctness. They write the necessary identity operands, 0, in the output array, since in the next phase the data are going to be handled as uint4.
Parameters
[in]ininput array of uint elements.
[out]out(reduced) output array of uint elements. When the kernel is dispatched with one work-group per row, the array contains the final results, and its size should be \( rows*sizeof\ (uint) \). When the kernel is dispatched with more than one work-groups per row, the array contains the results from each block reduction, and its size should be \( wgXdim*rows*sizeof\ (uint) \).
[in]datalocal buffer. Its size should be 2 uint elements for each work-item in a work-group. That is \( 2*lXdim*sizeof\ (uint) \).
[in]nnumber of elements in a row of the array divided by 4.
kernel void reduce_min_f ( global float4 *  in,
global float *  out,
local float *  data,
uint  n 
)

Performs a reduce operation on the columns of an array.

Computes the minimum element for each row in an array.

Note
When there are multiple rows in the array, a reduce operation is performed per row, in parallel.
The number of elements, N, in a row of the array should be a multiple of 4 (the data are handled as float4). The x dimension of the global workspace, \( gXdim \), should be greater or equal to the number of elements in a row of the array divided by 8. That is, \( \ gXdim \geq N/8 \). Each work-item handles 8 float (= 2 float4) elements in a row of the array. The y dimension of the global workspace, \( gYdim \), should be equal to the number of rows, M, in the array. That is, \( \ gYdim = M \). The local workspace should be 1 in the y dimension, and a power of 2 in the x dimension. It is recommended to use one wavefront/warp per work-group.
When the number of elements per row of the array is small enough to be handled by a single work-group, the output array will contain the true minimums. When the elements are more than that, they are partitioned into blocks and reduced independently. In this case, the kernel outputs the minimums from each block reduction. A reduction should then be made on those minimums for the final results. The number of work-groups in the x dimension, \( wgXdim \), for the case of multiple work-groups, should be made a multiple of 4. The potential extra work-groups are used for enforcing correctness. They write the necessary identity operands, INFINITY, in the output array, since in the next phase the data are going to be handled as float4.
Parameters
[in]ininput array of float elements.
[out]out(reduced) output array of float elements. When the kernel is dispatched with one work-group per row, the array contains the final results, and its size should be \( rows*sizeof\ (float) \). When the kernel is dispatched with more than one work-groups per row, the array contains the results from each block reduction, and its size should be \( wgXdim*rows*sizeof\ (float) \).
[in]datalocal buffer. Its size should be 2 float elements for each work-item in a work-group. That is \( 2*lXdim*sizeof\ (float) \).
[in]nnumber of elements in a row of the array divided by 4.