Global and Local Work Size in OpenCL
The work-items in a given work-group execute concurrently on the processing elements of a single compute unit. This is a critical point in understanding the concurrency in OpenCL. ... OpenCL only assures that the workitems within a work-group execute concurrently (and share processor resources on the device).
- global work offset: what this parameter does is to alter the values that are returned by get_global_id() in the kernel.
- global work size: the total number of work-items that can execute this kernel in parallel.
- local work size: the number of work-items to be grouped together in a workgroup.
- The total number of work-items in a work-group is computed as local_work_size[0] *... * local_work_size[work_dim - 1].
- The total number of work-items in the work-group must be less than or equal to the CL_DEVICE_MAX_WORK_GROUP_SIZE value specified in table of OpenCL Device Queries for clGetDeviceInfo and
- the number of work-items specified in local_work_size[0],... local_work_size[work_dim - 1] must be less than or equal to the corresponding values specified by CL_DEVICE_MAX_WORK_ITEM_SIZES[0],.... CL_DEVICE_MAX_WORK_ITEM_SIZES[work_dim - 1].
References:
- clEnqueueNDRangeKernel @ https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html
- Using OpenCL's Global Work Offset @ http://www.iterationzero.co.uk/?p=44
- http://stackoverflow.com/questions/3957125/questions-about-global-and-local-work-size