Wolfram Language & System Documentation Center

OpenCLFunctionLoad

OpenCLFunctionLoad["src",fun,argtypes,blockdims]

compiles the string src and makes fun available in the Wolfram Language as an OpenCLFunction.

OpenCLFunctionLoad[File[srcfile],fun,argtypes,blockdim]

compiles the source code file srcfile and then loads fun as an OpenCLFunction.

OpenCLFunctionLoad[File[libfile],fun,argtypes,blockdim]

loads fun as an OpenCLFunction. from the previously compiled library libfile.

OpenCLLink`

OpenCLFunctionLoad

OpenCLFunctionLoad["src",fun,argtypes,blockdims]

compiles the string src and makes fun available in the Wolfram Language as an OpenCLFunction.

OpenCLFunctionLoad[File[srcfile],fun,argtypes,blockdim]

compiles the source code file srcfile and then loads fun as an OpenCLFunction.

OpenCLFunctionLoad[File[libfile],fun,argtypes,blockdim]

loads fun as an OpenCLFunction. from the previously compiled library libfile.

Details and Options

The OpenCLLink application must be loaded using Needs["OpenCLLink`"].
If libfile is a dynamic library, then the dynamic library function fun is loaded.
Possible argument and return types, and their corresponding OpenCL type, include:

_Integer	mint	Wolfram Language integer
"Integer32"	int	32-bit integer
"Integer64"	long/long long	64-bit integer
_Real	Real_t	GPU real type
"Double"	double	machine double
"Float"	float	machine float
{base, rank, io}	OpenCLMemory	memory of specified base type, rank, and input/output option
"Local" \| "Shared"	mint	local or shared memory parameter
{"Local" \| "Shared", type}	mint	local or shared memory parameter

In the specification {base, rank, io}, valid settings of io are "Input", "Output", and "InputOutput".
The argument specification {base} is equivalent to {base,_,"InputOutput"}, and {base,rank} is equivalent to {base,rank,"InputOutput"}.
The rank can be omitted by using {base,_,io} or {base,io}.
Possible base types are:

_Integer	_Real	_Complex
"Byte"	"Bit16"	"Integer32"
"Byte[2]"	"Bit16[2]"	"Integer32[2]"
"Byte[4]"	"Bit16[4]"	"Integer32[4]"
"Byte[8]"	"Bit16[8]"	"Integer32[8]"
"Byte[16]"	"Bit16[16]"	"Integer32[16]"
"UnsignedByte"	"UnsignedBit16"	"UnsignedInteger"
"UnsignedByte[2]"	"UnsignedBit16[2]"	"UnsignedInteger[2]"
"UnsignedByte[4]"	"UnsignedBit16[4]"	"UnsignedInteger[4]"
"UnsignedByte[8]"	"UnsignedBit16[8]"	"UnsignedInteger[8]"
"UnsignedByte[16]"	"UnsignedBit16[16]"	"UnsignedInteger[16]"
"Double"	"Float"	"Integer64"
"Double[2]"	"Float[2]"	"Integer64[2]"
"Double[4]"	"Float[4]"	"Integer64[4]"
"Double[8]"	"Float[8]"	"Integer64[8]"
"Double[16]"	"Float[16]"	"Integer64[16]"

OpenCLFunctionLoad can be called more than once with different arguments.
Functions loaded by OpenCLFunctionLoad run in the same process as the Wolfram Language kernel.
Functions loaded by OpenCLFunctionLoad are unloaded when the Wolfram Language kernel exits.
Block dimensions can be either a list or an integer denoting how many threads per block to launch.
The maximum size of block dimensions is returned by the "Maximum Work Group Size" property of OpenCLInformation.
On launch, if the number of threads is not specified (as an extra argument to OpenCLFunction), then the dimension of the element with largest rank and dimension is chosen. For images, the rank is set to 2.
On launch, if the number of threads is not a multiple of the block dimension, then it is incremented to be a multiple of the block dimension.
The following options can be given:

"CompileOptions"	{}	compile options passed directly to the OpenCL compiler
"Defines"	Automatic	defines passed to the OpenCL preprocessor
"Device"	$OpenCLDevice	OpenCL device used in computation
"IncludeDirectories"	{}	directories to include in the compilation
"Platform"	$OpenCLPlatform	OpenCL platform used in computation
"ShellCommandFunction"	None	function to call with the shell commands used for compilation
"ShellOutputFunction"	None	function to call with the shell output of running the compilation commands
"TargetPrecision"	Automatic	precision used in computation
"WorkingDirectory"	Automatic	the directory in which temporary files will be generated

Examples

open all close all

Basic Examples (5)

First, load the OpenCLLink application:

Wolfram Language code: Needs["OpenCLLink`"]

Define the OpenCL source code to load:

Wolfram Language code:

src = "__kernel void myKernel( __global mint * global0Id, __global mint * global1Id, mint width, mint height) {
    int xIndex = get_global_id(0);
    int yIndex = get_global_id(1);
    int index = xIndex + yIndex*width;
	if (xIndex < width && yIndex < height) {
	   global0Id[index] = get_local_id(0);
       global1Id[index] = get_local_id(1);
    }
  }";

Loads the OpenCL function:

Wolfram Language code: fun = OpenCLFunctionLoad[src, "myKernel", {{_Integer}, {_Integer}, _Integer, _Integer}, {16, 16}]

Define the input parameters:

Wolfram Language code:

width = 64;
height = 64;
global0Id = ConstantArray[0, {width, height}];
global1Id = ConstantArray[0, {width, height}];

Calls the function with the arguments:

Wolfram Language code: res = fun[global0Id, global1Id, width, height];

Plot the result using ArrayPlot:

Wolfram Language code: ArrayPlot /@ res

Define the path to the OpenCL source file from the "SupportFiles/vectorAdd.cl":

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "vectorAdd.cl"}]

Compile and load the OpenCL function from the file:

Wolfram Language code:

vectorAdd = OpenCLFunctionLoad[File[srcf], "vectorAdd", {{_Integer, _, "Input"}, {_Integer, _, "Input"}, {_Integer, _, "Output"}, _Integer}, 16]

This calls the function:

Wolfram Language code: vectorAdd[Range[32], ConstantArray[2, 32], ConstantArray[0, 32], 32]

Locate the example OpenCLLink library "addTwo_Dobule":

Wolfram Language code: libPath = FindLibrary["addTwo_Double"]

Load the library using OpenCLFunctionLoad:

Wolfram Language code: libFun = OpenCLFunctionLoad[File[libPath], "oAddTwo", {{_Integer, "Input"}, {_Integer, "Output"}}, 16];

The function adds two to an input list:

Wolfram Language code: libFun[ConstantArray[1, 16], ConstantArray[1, 16]]

The source code for this example is bundled with OpenCLLink:

Wolfram Language code: FileNameJoin[{$OpenCLLinkPath, "CSource", "addTwo.cl"}]

An extra argument can be given when calling OpenCLFunction. The argument denotes the number of threads to launch (or the global work group size). Using the previous example:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "vectorAdd.cl"}]

This loads the OpenCL function from the file:

Wolfram Language code:

vectorAdd = OpenCLFunctionLoad[File[srcf], "vectorAdd", {{_Integer, _, "Input"}, {_Integer, _, "Input"}, {_Integer, _, "Output"}, _Integer}, 16]

This calls the function with 32 threads, which results in only the first 32 values in the vector add being computed:

Wolfram Language code: vectorAdd[Range[64], ConstantArray[2, 64], ConstantArray[0, 64], 256, 32]

If code contains syntax errors, then a "compilation failed" error is returned:

Wolfram Language code:

OpenCLFunctionLoad["__kernel void zero( __global mint * in, mint length) {
    int index = get_global_id(0);

	if (index < length)
	   in[index] = 0z;
  }", "zero", {{_Integer}, _Integer}, {10}];

The "ShellOutputFunction" option can be used to print the build log:

Wolfram Language code:

OpenCLFunctionLoad["__kernel void zero( __global mint * in, mint length) {
    int index = get_global_id(0);

	if (index < length)
	   in[index] = 0z;
  }", "zero", {{_Integer}, _Integer}, {10}, "ShellOutputFunction" -> Print];

The above error states that there is a typo in the code, with a z after the 0 in the code:

Wolfram Language code:

OpenCLFunctionLoad["__kernel void zero( __global mint * in, mint length) {
    int index = get_global_id(0);

	if (index < length)
	   in[index] = 0;
  }", "zero", {{_Integer}, _Integer}, {10}, "ShellOutputFunction" -> Print]

Scope (2)

Templated Function (1)

Templated functions can be simulated using macros. Leave as an undefined macro:

Wolfram Language code:

src = "__kernel void imageColorNegate(__global Generic_t * in, __global Generic_t * out, mint width, mint height, mint channels) {
    int ii;
	int xIndex = get_global_id(0);
	int yIndex = get_global_id(1);
	int index = channels*(xIndex + yIndex*width);
	if (xIndex < width && yIndex < height) {
		for (ii = 0; ii < channels; ii++)
			out[index+ii] = 255 - in[index+ii];
    }
}";

Set the macro to during compilation:

Wolfram Language code:

intColorNegate = OpenCLFunctionLoad[src, "imageColorNegate", {{_Integer, _, "Input"}, {_Integer, _, "Output"}, _Integer, _Integer, _Integer}, {16, 16}, "Defines" -> {"Generic_t" -> "mint"}]

Sets to instead:

Wolfram Language code:

intColorNegate = OpenCLFunctionLoad[src, "imageColorNegate", {{"Float", _, "Input"}, {"Float", _, "Output"}, _Integer, _Integer, _Integer}, {16, 16}, "Defines" -> {"Generic_t" -> "float"}]

Shared or Local Memory (1)

OpenCLFunctionLoad can be used to specify "Local" or "Shared" memory on launch. The following code uses shared memory to store global memory for gradient computation:

Wolfram Language code:

code = "
__kernel void grad(__global mint * img, mint n, __local mint * smem) {
    int tx = get_local_id(0);
    int bx = get_group_id(0);
    int dx = get_local_size(0);
    int index = tx + bx*dx;

#define S(txOffset)    smem[txOffset + 1]
    S(tx) = img[index];
	if (tx == 0) {
		S(tx - 1) = index > 0 ? img[index - 1] : 0;
	} else if (tx == dx - 1) {
		S(tx + 1) = index < n-1 ? img[index + 1] : 0;
	}
	barrier(CLK_LOCAL_MEM_FENCE);
	
	tx += 1;
	if (index < n)
		img[index] = (S(tx + 1) - S(tx-1))/2; 
}";

This specifies the input arguments, with the last argument being "Shared" for shared memory. The block size is set to 256:

Wolfram Language code: fun = OpenCLFunctionLoad[code, "grad", {{_Integer}, _Integer, "Shared"}, 256]

This computes the flattened length of a grayscale image:

Wolfram Language code: n = Times@@ImageDimensions[[image]];

This invokes the function. The shared memory size is set to (blockSize+2)⋆sizeof (int) and the number of launch threads is set to the flattened length of the image:

Wolfram Language code: fun[[image], n, (256 + 2) * 4, n]

A nicer way of specifying the shared memory size is using types:

Wolfram Language code: fun = OpenCLFunctionLoad[code, "grad", {{_Integer}, _Integer, {"Shared", _Integer}}, 256]

Using shared memory types, you need not pass in the size of the type:

Wolfram Language code: fun[[image], n, 256 + 2, n]

Applications (10)

Image Input (1)

The input can be images; here you write code that performs linear interpolation between images (this can be done using ImageCompose):

Wolfram Language code:

src = "
__kernel void linearCombine(__global mint * output, __global mint * input0, __global mint * input1, float a, float b, mint width, mint height, mint channels) {

	int xIndex = get_global_id(0);
	int yIndex = get_global_id(1);
	
	if (xIndex >= width || yIndex >= height)
		return ;
	int pos = channels * (yIndex * width + xIndex);
	for (int ii = 0; ii < channels; ii++) {
		output[pos + ii] = input0[pos + ii] * a + input1[pos + ii] * b;
	}
}
";

This loads OpenCLFunction from the source code above:

Wolfram Language code:

ImageLinearCombine = OpenCLFunctionLoad[src, "linearCombine", {{_Integer, _, "Output"}, {_Integer, _, "Input"}, {_Integer, _, "Input"}, "Float", "Float", _Integer, _Integer, _Integer}, {16, 16}]

This sets the height, width, and channel values. It also allocates memory for the output:

Wolfram Language code:

{height, width, channels} = Flatten[{ImageDimensions[[image]], ImageChannels[[image]]}];
output = OpenCLMemoryAllocate[Integer, {width, height, channels}]

This calls the function with {width,height} threads:

Wolfram Language code: ImageLinearCombine[output, [image], [image], 0.5, 0.5, width, height, channels, {width, height}]

This gets the memory and displays it as an image:

Wolfram Language code: Image[OpenCLMemoryGet[output], "Byte"]

You can take the above and make a function OpenCLImageLinearCombine:

Wolfram Language code:

OpenCLImageLinearCombine[input0_Image, a_Real, input10_Image, b_Real] := 
	Module[{input1, width, height, channels, output}, 
	input1 = If[ImageDimensions[input0] === ImageDimensions[input10], 
	input10, 
	ImageResize[input10, ImageDimensions[input0]]
	];
	{height, width, channels} = Flatten[{ImageDimensions[input0], ImageChannels[input0]}];
	output = OpenCLMemoryAllocate[Integer, {width, height, channels}];
	ImageLinearCombine[output, input0, input1, a, b, width, height, channels, {width, height}];
	With[{res = Image[OpenCLMemoryGet[output], "Byte"]}, 
	OpenCLMemoryUnload[output];
	res
	]
	]

The function now has similar syntax to ImageCompose:

Wolfram Language code: OpenCLImageLinearCombine[[image], -1.0, [image], 2.0]

A Manipulate can be used to play with the interpolation coefficients:

Wolfram Language code: Manipulate[OpenCLImageLinearCombine[[image], a, [image], b], {{a, 0.0}, -2.0, 1.0, 0.01}, {{b, 1.0}, -2.0, 2.0, 0.01}]

Effects can be made; in this example, a smooth animation is viewed:

Wolfram Language code: Animate[OpenCLImageLinearCombine[[image], ii, [image], 1 - ii], {ii, 0.0, 1.0}]

Uniform Random Number Generation (1)

Uniform random number generators are common seed problems in many applications. This implements uniform random number generators in OpenCL:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "URNG_Kernels.cl"}]

This loads the source as an OpenCLFunction. This algorithm uses an image to provide an upper bound to the random number:

Wolfram Language code:

urng = OpenCLFunctionLoad[File[srcf], "noise_uniform", {{"Float[4]", _, "Input"}, {"Float[4]", _, "Output"}, _Integer}, {64, 1}]

This calls OpenCLFunction; note that you can pass images directly into an OpenCLFunction so long as it can be interpreted using the appropriate specified type:

Wolfram Language code: res = urng[[image], [image], 1, {512, 512}]

Notice that this is not a regular duck image; it is a 4-channel image with alpha channel set to 1 (using SetAlphaChannel):

Wolfram Language code: ImageChannels[[image]]

The random output can be used to detect important edges in an image:

Wolfram Language code: ImageAdd[[image], EdgeDetect[First[res], 15]]

Random Number Generation Using the Mersenne Twister (1)

The Mersenne Twister is another uniform random number generator algorithm (more sophisticated than the one mentioned above). The implementation is located here:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "MersenneTwister_kernel.cl"}]

This loads OpenCLFunction; you specify the type _Real, which means that the Real type is dependent on the CPU capabilities (whether it supports double precision or not):

Wolfram Language code:

mersenneTwister = OpenCLFunctionLoad[File[srcf], "MersenneTwister", {{_Real, _, "Output"}, {_Integer, _, "Input"}, {_Integer, _, "Input"}, {_Integer, _, "Input"}, {_Integer, _, "Input"}, _Integer}, 32]

This sets up the Mersenne Twister's input and output parameters (for more information, refer to the algorithm description):

Wolfram Language code:

MTRNGCount = 4096;
PATHN = 2 ^ 25;
NPerRNG = Ceiling[PATHN / MTRNGCount];
NPerRNG = If[EvenQ[NPerRNG], NPerRNG, NPerRNG + 1];
RANDN = MTRNGCount * NPerRNG;
{hsMatrixA, hsMaskB, hsMaskC} = RandomInteger[{-2147483647, 2147483647}, {3, MTRNGCount}];
hsSeed = RandomInteger[{-2147483647, 2147483647}, MTRNGCount];
output = OpenCLMemoryAllocate[Real, RANDN]

This invokes OpenCLFunction:

Wolfram Language code: mersenneTwister[output, hsMatrixA, hsMaskB, hsMaskC, hsSeed, NPerRNG, MTRNGCount]

This plots the output's results:

Wolfram Language code: ListPlot[OpenCLMemoryGet[output][[ ;; 1000]]]

If the output is timed:

Wolfram Language code: mersenneTwister[output, hsMatrixA, hsMaskB, hsMaskC, hsSeed, NPerRNG, MTRNGCount];//AbsoluteTiming

There is almost an 11× increase in speed:

Wolfram Language code: BlockRandom[SeedRandom[1, Method -> "MersenneTwister"];RandomReal[1, RANDN]];//AbsoluteTiming

Prefix Sum Algorithm (1)

The scan, or prefix sum, algorithm is similar to FoldList and is a very useful primitive algorithm that can be used in a variety of scenarios. The OpenCL implementation is found in:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "Scan.cl"}]

This loads the three kernels used in computation:

Wolfram Language code:

scanExclusiveShared = OpenCLFunctionLoad[File[srcf], "scanExclusiveLocal1", {{"Integer32", "Output"}, {"Integer32", "Input"}, {"Local", "Integer32"}, "Integer32"}, 256, "Defines" -> {"WORKGROUP_SIZE" -> 256}];
scanExclusiveShared2 = OpenCLFunctionLoad[File[srcf], "scanExclusiveLocal2", {{"Integer32", "InputOutput"}, {"Integer32",   "Output"}, {"Integer32", "Input"}, {"Local", "Integer32"}, "Integer32", "Integer32"}, 256, "Defines" -> {"WORKGROUP_SIZE" -> 256}];
uniformUpdate = OpenCLFunctionLoad[File[srcf], "uniformUpdate", {{"Integer32", "InputOutput"}, {"Integer32", "InputOutput"}}, 256, "Defines" -> {"WORKGROUP_SIZE" -> 256}];

This generates random input data:

Wolfram Language code: data = RandomInteger[10, 256];

This allocates the output buffer:

Wolfram Language code: dest = OpenCLMemoryAllocate[Integer, 256];

This computes the block and grid dimensions:

Wolfram Language code:

blockDim = 256;
gridDim = blockDim * Ceiling[Length[data] / (4 * blockDim)];

A temporary buffer is needed in computation:

Wolfram Language code: buffer = OpenCLMemoryAllocate[Integer, 1 + (gridDim / blockDim)];

This performs the scan operation:

Wolfram Language code:

scanExclusiveShared[dest, data, 512, 4 * blockDim, gridDim];
scanExclusiveShared2[buffer, dest, data, 512, 1 + (gridDim / blockDim), 1 + (gridDim / blockDim)];
uniformUpdate[dest, buffer];

This retrieves the output buffer:

Wolfram Language code: OpenCLMemoryGet[dest]

This deallocates the OpenCLMemory elements:

Wolfram Language code: OpenCLMemoryUnload[dest, buffer]

Matrix Operations (1)

Matrix transpose is a fundamental algorithm in many applications. This specifies the inputs:

Wolfram Language code:

size = 5;
input = RandomReal[1., {size, size}];
output = ConstantArray[0., {size, size}];
localWorkSize = 16;
globalworkSize = Ceiling[size / localWorkSize] * localWorkSize;

This loads OpenCLFunction:

Wolfram Language code:

oclTranspose = OpenCLFunctionLoad[{FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "transpose.cl"}]}, "transpose", {{"Float", _, "Output"}, {"Float", _, "Input"}, _Integer, _Integer, _Integer, {"Shared", _Integer}}, {localWorkSize, localWorkSize}]

This calls OpenCLFunction:

Wolfram Language code: res = oclTranspose[output, input, 0, size, size, localWorkSize * localWorkSize, {globalworkSize, globalworkSize}];

This shows the MatrixForm of the result:

Wolfram Language code: First[res]//MatrixForm

The result agrees with the Wolfram Language:

Wolfram Language code: MatrixForm[Transpose[input]]

Matrix Multiplication (1)

Matrix multiplication is implemented here:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "matrixMul.cl"}]

This defines the block size:

Wolfram Language code: blockSize = 4;

This loads OpenCLFunction; note it is specified that the input must be rank 2:

Wolfram Language code:

MatrixMultiply = OpenCLFunctionLoad[File[srcf], "matrixMul", {{"Float", 2, "Output"}, {"Float", 2, "Input"}, {"Float", 2, "Input"}, {"Local", "Float"}, {"Local", "Float"}, _Integer, _Integer}, {blockSize, blockSize}, "Defines" -> {"BLOCK_SIZE" -> blockSize}]

This creates random input and allocates the output:

Wolfram Language code:

A = RandomReal[1.0, {8, 8}];
B = RandomReal[1.0, {8, 8}];
out = OpenCLMemoryAllocate["Float", {8, 8}]

This calls OpenCLFunction:

Wolfram Language code: MatrixMultiply[out, A, B, blockSize * blockSize, blockSize * blockSize, 8, 8]

This gets the output memory using OpenCLMemoryGet:

Wolfram Language code: OpenCLMemoryGet[out]//MatrixForm

The output agrees with the Wolfram Language:

Wolfram Language code: Dot[A, B]//MatrixForm

Fast Fourier Transform (1)

The one-dimensional discrete fast Fourier transform can be implemented using OpenCLLink; this implementation assumes that the input is a power of 2:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "FFT_Kernels.cl"}]

This loads OpenCLFunction using OpenCLFunctionLoad:

Wolfram Language code: fft = OpenCLFunctionLoad[File[srcf], "kfft", {{"Float", _, "InputOutput"}, {"Float", _, "Output"}}, 64]

This creates input and output lists:

Wolfram Language code:

in = RandomReal[1.0, 1024];
out = ConstantArray[0.0, 1024];

This calls the output memory and creates a complex list, displaying only the first 50 elements:

Wolfram Language code: MapThread[Complex, fft[in, out, 64]][[ ;; 50]]

The result agrees with Fourier:

Wolfram Language code: Fourier[in, FourierParameters -> {1, -1}][[ ;; 50]]

Financial Derivative (1)

Black–Scholes models financial derivative investments and is implemented in OpenCL:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "BlackScholes.cl"}]

This loads OpenCLFunction:

Wolfram Language code:

BlackScholes = OpenCLFunctionLoad[File[srcf], "BlackScholes", {{_Real, _, "Output"}, {_Real, _, "Output"}, {_Real, _, "Input"}, {_Real, _, "Input"}, {_Real, _, "Input"}, _Real, _Real, _Integer}, 128, "TargetPrecision" -> "Single"]

This assigns the input parameters:

Wolfram Language code:

numberOfOptions = 64;
call = OpenCLMemoryAllocate["Float", numberOfOptions];
put = OpenCLMemoryAllocate["Float", numberOfOptions];
currentPrices = RandomReal[{25.0, 35.0}, numberOfOptions];
strikePrices = RandomReal[{20.0, 40.0}, numberOfOptions];
strikeTimes = RandomReal[{0.1, 10.0}, numberOfOptions];
riskFree = 0.02;
volatility = 0.30;

This invokes OpenCLFunction:

Wolfram Language code:

BlackScholes[call, put, currentPrices, strikePrices, strikeTimes, riskFree, volatility, numberOfOptions, numberOfOptions]

This gets the call values:

Wolfram Language code: OpenCLMemoryGet[call]

The result agrees with the output of FinancialDerivative:

Wolfram Language code:

MapThread[FinancialDerivative[{"European", "Call"}, {"StrikePrice" -> #1, "Expiration" -> #2},   {"InterestRate" -> riskFree, "Volatility"  -> volatility, "CurrentPrice" -> #3}]&, {strikePrices, strikeTimes, currentPrices}]

For timing, the number of options to be valuated is increased:

Wolfram Language code:

numberOfOptions = 2048;
call = OpenCLMemoryAllocate["Float", numberOfOptions];
put = OpenCLMemoryAllocate["Float", numberOfOptions];
currentPrices = RandomReal[{25.0, 35.0}, numberOfOptions];
strikePrices = RandomReal[{20.0, 40.0}, numberOfOptions];
strikeTimes = RandomReal[{0.1, 10.0}, numberOfOptions];
riskFree = 0.02;
volatility = 0.30;

On the C2050, it takes 1/100 of a second to valuate 2048 options:

Wolfram Language code:

BlackScholes[call, put, currentPrices, strikePrices, strikeTimes, riskFree, volatility, numberOfOptions, numberOfOptions];//AbsoluteTiming

On a Core i7 950, FinancialDerivative takes 1.13 seconds. This is a speedup of 280×. Note that increasing the number of options will exhibit more speedups:

Wolfram Language code:

MapThread[FinancialDerivative[{"European", "Call"}, {"StrikePrice" -> #1, "Expiration" -> #2},   {"InterestRate" -> riskFree, "Volatility"  -> volatility, "CurrentPrice" -> #3}]&, {strikePrices, strikeTimes, currentPrices}];//AbsoluteTiming

Gaussian Filter (1)

Recursive Gaussian is used to approximate the Gaussian filter. The Gaussian matrix is separable:

Wolfram Language code: GaussianMatrix[10]//ArrayPlot

It can be written as the outer product of two 1D Gaussians:

Wolfram Language code: Outer[Times, GaussianMatrix[{{10}}], GaussianMatrix[{{10}}]]//ArrayPlot

Locate the implementation of the recursive Gaussian:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "recursiveGaussian.cl"}]

Load two functions using OpenCLFunctionLoad:

Wolfram Language code:

RecusiveGaussian = OpenCLFunctionLoad[File[srcf], "RecursiveGaussian_kernel", {{"UnsignedByte[4]", _, "Input"}, {"UnsignedByte[4]", _, "Output"}, _Integer, _Integer, "Float", "Float", "Float", "Float", "Float", "Float", "Float", "Float"}, {256, 1}];
OpenCLTranspose = OpenCLFunctionLoad[File[srcf], "transpose_kernel", {{"UnsignedByte[4]", _, "Output"}, {"UnsignedByte[4]", _, "Input"}, "Local", _Integer, _Integer, _Integer}, {16, 16}];

Specifies the value in the Gaussian :

Wolfram Language code: σ = 5.0;

Calculate the normal distribution:

Wolfram Language code: PDF[NormalDistribution[μ, σ], p]

The Wolfram Language can plot the distribution:

Wolfram Language code: Plot[PDF[NormalDistribution[0, 5.0], x], {x, -6, 6}, Filling -> Axis]

Calculate the recursive Gaussian parameters:

Wolfram Language code:

alpha = 1.695 / σ;
ema = Exp[-alpha];
ema2 = Exp[-2 * alpha];
k = (1 - ema) * (1 - ema) / (1 + 2 * alpha * ema - ema2);
a0 = k;
a1 = k * (alpha - 1) * ema;
a2 = k * (alpha + 1) * ema;
a3 = -k * ema2;
b1 = -2 * ema;
b2 = ema2;
coefp = (a0 + a1) / (1 + b1 + b2);
coefn = (a2 + a3) / (1 + b1 + b2);

Allocate OpenCLMemory for the input, output, and temporary storage:

Wolfram Language code:

{width, height} = ImageDimensions[[image]];
input = OpenCLMemoryLoad[[image], "UnsignedByte[4]"];
temp = OpenCLMemoryAllocate["UnsignedByte[4]", {width, height}];
output = OpenCLMemoryAllocate["UnsignedByte[4]", {width, height}];

Perform the Gaussian horizontally, then transpose, then perform the Gaussian vertically, and finally transpose to get the full Gaussian:

Wolfram Language code:

RecusiveGaussian[input, temp, width, height, a0, a1, a2, a3, b1, b2, coefp, coefn, {width, 1}];
OpenCLTranspose[output, temp, 16 * 16 * 4, width, height, 16, {width, height}];
RecusiveGaussian[output, temp, width, height, a0, a1, a2, a3, b1, b2, coefp, coefn, {height, 1}];
OpenCLTranspose[output, temp, 16 * 16 * 4, width, height, 16, {width, height}];

Reconstruct the image from the data:

Wolfram Language code: Image[OpenCLMemoryGet[output], "Byte", "ColorSpace" -> "RGB"]

Again you compare timing:

Wolfram Language code:

AbsoluteTiming[
	RecusiveGaussian[input, temp, width, height, a0, a1, a2, a3, b1, b2, coefp, coefn, {width, 1}];
OpenCLTranspose[output, temp, 16 * 16 * 4, width, height, 16, {width, height}];
RecusiveGaussian[output, temp, width, height, a0, a1, a2, a3, b1, b2, coefp, coefn, {height, 1}];
OpenCLTranspose[output, temp, 16 * 16 * 4, width, height, 16, {width, height}];]

And notice a 4× performance boost:

Wolfram Language code: GaussianFilter[[image], {10, σ}];//AbsoluteTiming

Sorting (1)

Bitonic sort sorts a given set of integers. It is similar in principle to merge sort. The OpenCL implementation only works on lists of length of a power of 2 and can be found here:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "bitonicSort.cl"}]

Wolfram Language code:

BitonicSort = OpenCLFunctionLoad[File[srcf], "bitonicSort", {{_Integer, _, "InputOutput"}, _Integer, _Integer, _Integer, _Integer}, 16]

This sets the length of the input and loads it. The direction denotes whether to sort from highest to lowest or lowest to highest. In this case, you sort from lowest to highest:

Wolfram Language code:

width = 64;
list = OpenCLMemoryLoad[Reverse[Range[width]]];
direction = 1;

This gets the input list:

Wolfram Language code: OpenCLMemoryGet[list]

This calls bitonic sort, similar to merge sort; multiple calls are needed for a full sort:

Wolfram Language code:

For[stage = 0, stage < Log[2, width], stage++, 
	For[passOfStage = 0, passOfStage <= stage, passOfStage++, BitonicSort[list, stage, passOfStage, width, direction]
	]
	]

The output list is retrieved sorted:

Wolfram Language code: OpenCLMemoryGet[list]

Possible Issues (5)

The maximum work item sizes (block dimensions) are returned by OpenCLInformation:

Wolfram Language code: OpenCLInformation[$OpenCLPlatform, $OpenCLDevice, "Maximum Work Item Sizes"]

On some systems, this can be limited to 1.

To use double-precision operations in the OpenCL code, the user must place the following pragmas in the code header:

#ifdef USING_DOUBLE_PRECISIONQ
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#endif /* USING_DOUBLE_PRECISIONQ */

Errors in the function call can place OpenCLLink in an unusable state. This is a side effect of allowing users to write arbitrary kernels. Infinite loops, buffer overflows, etc. in the kernel code can make both OpenCLLink and the video driver unstable. In an extreme case, this may crash the display driver, but usually it just makes further evaluation of OpenCL code return invalid results.

Bugs in some OpenCL implementations may cause the kernel to crash if one of the "IncludeDirectories" contains a space.

Use of memory modifiers such as is not supported by OpenCLLink. Memory passed into an OpenCLFunction must be .

Interactive Examples (5)

Mandelbrot Set (1)

The Mandelbrot set plots all points satisfying the recurrence equation with a complex number. The following implements the set in OpenCL (a slightly more complicated coloring strategy is used to ensure colors have smooth transition):

Wolfram Language code:

src = "
__kernel void mandelbrot_kernel(__global mint * set, float zoom, float bailout, mint width, mint height) {
   int xIndex = get_global_id(0);
   int yIndex = get_global_id(1);
   int ii;

   float x0 = zoom*(width/3 - xIndex);
   float y0 = zoom*(height/2 - yIndex);
   float tmp, x = 0, y = 0;
   float c;

   if (xIndex < width && yIndex < height) {
       for (ii = 0; (x*x+y*y <= bailout) && (ii < MAX_ITERATIONS); ii++) {
            tmp = x*x - y*y +x0;
            y = 2*x*y + y0;
            x = tmp;
        }
        c = ii - log(log(sqrt(x*x + y*y)))/log(2.0f);
        if (ii == MAX_ITERATIONS) {
            set[3*(xIndex + yIndex*width)] = 0;
            set[3*(xIndex + yIndex*width) + 1] = 0;
            set[3*(xIndex + yIndex*width) + 2] = 0;
        } else {
            set[3*(xIndex + yIndex*width)] = ii*c/4 + 20;
            set[3*(xIndex + yIndex*width) + 1] = ii*c/4;
            set[3*(xIndex + yIndex*width) + 2] = ii*c/4 + 5;
        }
    }
}
";

Wolfram Language code:

MandelbrotSet = OpenCLFunctionLoad[src, "mandelbrot_kernel", {{_Integer, _, "Output"}, "Float", "Float", _Integer, _Integer}, {16, 16}, "Defines" -> {"MAX_ITERATIONS" -> 100}]

Wolfram Language code:

width = 2048;
height = 1024;
mem = OpenCLMemoryAllocate[Integer, {height, width, 3}];

Wolfram Language code: res = MandelbrotSet[mem, 0.0017, 8.0, width, height, {width, height}]

Wolfram Language code: Image[OpenCLMemoryGet[First[res]], "Byte"]

Wolfram Language code:

Manipulate[
	MandelbrotSet[mem, zoom, 8.0, width, height, {width, height}];
	Image[OpenCLMemoryGet[First[res]], "Byte"], {{zoom, 0.0017}, 0.0001, 0.003, 0.0001}]

Julia Set (1)

The Mandelbrot set is a restricted form of the Julia set; here is the code for the Julia set:

Wolfram Language code:

code = "
#ifdef USING_DOUBLE_PRECISIONQ
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#endif /* USING_DOUBLE_PRECISIONQ */
__kernel void julia_kernel(__global Real_t * set, mint width, mint height, Real_t cx, Real_t cy) {
   int xIndex = get_global_id(0);
   int yIndex = get_global_id(1);
   int ii;

   Real_t x = ZOOM_LEVEL*(width/2 - xIndex);
   Real_t y = ZOOM_LEVEL*(height/2 - yIndex);
   Real_t tmp;
   Real_t c;

   if (xIndex < width && yIndex < height) {
       for (ii = 0; ii < MAX_ITERATIONS && x*x + y*y < BAILOUT; ii++) {
			tmp = x*x - y*y + cx;
			y = 2*x*y + cy;
			x = tmp;
		}
		c = log(0.1f + sqrt(x*x + y*y));
		set[xIndex + yIndex*width] = c;
    }
}
";

This defines the input memory and parameters:

Wolfram Language code:

{width, height} = {512, 512};
jset = OpenCLMemoryAllocate[Real, {height, width}];

This loads OpenCLFunction:

Wolfram Language code:

JuliaCalculate = OpenCLFunctionLoad[code, "julia_kernel", {{_Real, _, "Output"}, _Integer, _Integer, _Real, _Real}, {16, 16}, "Defines" -> {"MAX_ITERATIONS" -> 10, "ZOOM_LEVEL" -> "0.0050", "BAILOUT" -> "4.0"}];

This computes the Julia set and plots it using ReliefPlot:

Wolfram Language code:

Manipulate[
	JuliaCalculate[jset, width, height, c[[1]], c[[2]], {width, height}];
	ReliefPlot[OpenCLMemoryGet[jset], DataRange -> {{-2.0, 2.0}, {-2.0, 2.0}}, ImageSize -> 256, ColorFunction -> "SunsetColors"], 
	{{c, {0, 1}}, {-2, -2}, {2, 2}, Locator}]

This computes the Julia set and displays it as a grayscale image:

Wolfram Language code:

Manipulate[JuliaCalculate[jset, width, height, c, d, {width, height}]; Image[OpenCLMemoryGet[jset], ImageSize -> 256], {{c, 0.0}, -2.0, 2.0, Slider}, {{d, 0.0}, -2.0, 2.0, Slider}]

Image Adjustment (1)

ImageAdjust rescales the image to input high and low values. Gamma correction is also considered. The following defines a simplified version of ImageAdjust in OpenCL:

Wolfram Language code:

src = "
mint xclamp(mint val, mint low, mint high) {
	return val <= low ? low : (val >= high ? high : val);
}

mint adjust(mint pixel, float lowIn, float highIn, float lowOut, float highOut, float gamma) {

	float res, val;
	val = xclamp(pixel, lowIn, highIn);
	
	res = pow((val - lowIn) / (highIn - lowIn), gamma);
	res = res * (highOut - lowOut) - lowOut;
	
	return res + 0.5f;
}

__kernel void imageAdjust(__global mint * img, mint width, mint height, mint channels, float lowIn, float highIn, float lowOut, float highOut, float gamma) {
	
	int xIndex = get_global_id(0);
	int yIndex = get_global_id(1);
	if (xIndex >= width || yIndex >= height)
		return ;
	int pos = channels * (yIndex * width + xIndex);
	for (int ii = 0; ii < channels; ii++) {
		img[pos + ii] = adjust(img[pos + ii], 255*lowIn, 255*highIn, 255*lowOut, 255*highOut, gamma);
	}
}";

This loads OpenCLFunction:

Wolfram Language code:

cOpenCLImageAdjust = OpenCLFunctionLoad[src, "imageAdjust", {{_Integer}, _Integer, _Integer, _Integer, "Float", "Float", "Float", "Float", "Float"}, {16, 16}];

This defines a simple Wolfram Language wrapper function to make the OpenCL function have similar syntax to ImageAdjust:

Wolfram Language code:

OpenCLImageAdjust[img_Image, {lowIn_Real, highIn_Real}, gamma_ : 1.0] /; Head[gamma] == Real := OpenCLImageAdjust[img, {lowIn, highIn}, {0.0, 1.0}, gamma]
OpenCLImageAdjust[img_Image, {lowIn_Real, highIn_Real}, {lowOut_Real, highOut_Real}, gamma_ : 1.0] := 
	Module[{width, height, channels}, 
	{height, width, channels} = Flatten[{ImageDimensions[img], ImageChannels[img]}];
	cOpenCLImageAdjust[img, width, height, channels, lowIn, highIn, lowOut, highOut, gamma, {width, height}]//First
	]

This adjusts the image by rescaling the values between 0.3 and 0.8 to 0.0 and 1.0:

Wolfram Language code: OpenCLImageAdjust[[image], {0.3, 0.8}]

This adjusts the image by rescaling the values using Manipulate:

Wolfram Language code: Manipulate[OpenCLImageAdjust[[image], {0.1, high}], {high, 0.11, 1.0, 0.01}]

This adjusts the image by rescaling the values between 0.3 and 0.8 to 0.0 and 1.0:

Wolfram Language code: OpenCLImageAdjust[[image], {0.3, 0.8}, {0.0, 1.0}]

Bouncing Ball (1)

In this example, you compute the position of each particle in a box with varying initial forces. You delegate the particle physics simulation to OpenCL, while all the rest is done in the Wolfram Language:

Wolfram Language code:

BallBounceEffect[bb1_] := 
	Module[{tsize, fsize, z, r1, r, v, acc, device, state, BlockDim, GridDim, res, vc}, 
	tsize = 100;
	fsize = 80;
	z = Table[fsize - 2 + RandomReal[2], {i, tsize * tsize}];
	r1 = Table[.15 + .1 * Sin[2 * N[Pi] * (i + j) / tsize], {i, tsize}, {j, tsize}];
	First[r1];
	r = Flatten[r1];
	v = ConstantArray[0.0, tsize * tsize];
	acc = 2;
	device = Automatic;
	state = ConstantArray[1, tsize * tsize];
	
	BlockDim = 256;
	GridDim = Ceiling[(tsize * tsize) / BlockDim] * BlockDim;
	vc = Flatten[
	Table[
	RGBColor[0, .5 + (.25 - r[[(i - 1) * tsize + j]]) * 2, .5 + (.25 - r[[(i - 1) * tsize + j]]) * 2 ], {i, tsize}, {j, tsize}]
	];
	Graphics3D[{AbsolutePointSize[0], 
	Point[Dynamic[Refresh[
	res = bb1[v, z, r, state, acc, tsize, GridDim];
	z = res[[2]];
	v = res[[1]];
	state = res[[3]];
	Flatten[Table[{i, j, z[[(i - 1) * tsize + j]]}, {i, tsize}, {j, tsize}], 1], UpdateInterval -> 0]], VertexColors -> vc], Sphere[{tsize / 2, tsize / 2, -10}, .05], Sphere[{tsize / 2, tsize / 2, fsize}, .05], 
	{Black, Polygon[{{tsize, 0, 0}, {tsize, tsize, 0}, {0, tsize, 0}, {0, 0, 0}}]}
	}, Boxed -> False]
	
	]

This defines the OpenCL code and loads the function into the Wolfram Language:

Wolfram Language code:

BallBouncePatternDemo[] := 
	Module[{code, bb1, BlockDim}, 
	code = "
  __kernel void bb(__global float* v, __global float* z,__global float* r,__global mint *state, mint ac, mint size ) {
  	
      int i=get_global_id(0);
	  float acc=ac/10.0;
  	if(i < size * size ) {
  		if(v[i]<=0) {
  			v[i]=0;
  			state[i]=1;
  		}
  		if(z[i]<=0) {
  			z[i]=0;
  			state[i]=-1;
  		}
  		v[i]+=state[i]*acc;
  		z[i]-=state[i]*v[i]*r[i]/.25;
  	}
  }";
	BlockDim = 256;
	bb1 = OpenCLFunctionLoad[code, "bb", {{"Float"}, {"Float"}, {"Float", _, "Input"}, {_Integer}, _Integer, _Integer}, {BlockDim}];
	Mouseover[Graphics[{LightGray, Circle[], Inset[Style["Bring Mouse Here", Bold, Blue]]}], 
	BallBounceEffect[bb1]
	]
	]

Wolfram Language code: BallBouncePatternDemo[]

Wolfram Language code: Out[206]= [image]

N-Body Simulation (1)

The N-body simulation is a classic Newtonian problem. This implements it in OpenCL:

Wolfram Language code: srcf = FileNameJoin[{$OpenCLLinkPath, "SupportFiles", "NBody.cl"}];

This loads OpenCLFunction:

Wolfram Language code:

NBody = OpenCLFunctionLoad[File[srcf], "nbody_sim", {{"Float[4]", _, "Input"}, {"Float[4]", _, "Input"}, _Integer, "Float", "Float", {"Local", "Float"}, {"Float[4]", _, "Output"}, {"Float[4]", _, "Output"}}, 256]

The number of particles, time step, and epsilon distance are chosen:

Wolfram Language code:

numParticles = 1024;
deltaT = 0.05;
epsSqrt = 50.0;

This sets the input and output memories:

Wolfram Language code:

pos = OpenCLMemoryLoad[RandomReal[512, {numParticles, 4}], "Float[4]"];
vel = OpenCLMemoryLoad[RandomReal[1, {numParticles, 4}], "Float[4]"];
newPos = OpenCLMemoryAllocate["Float[4]", {numParticles}];
newVel = OpenCLMemoryAllocate["Float[4]", {numParticles}];

This calls the NBody function:

Wolfram Language code:

NBody[pos, vel, numParticles, deltaT, epsSqrt, 256 * 4, newPos, newVel, 1024];
NBody[newPos, newVel, numParticles, deltaT, epsSqrt, 256 * 4, pos, vel, 1024];

This plots the body points:

Wolfram Language code: Graphics3D[Point[Take[#, 3]& /@ OpenCLMemoryGet[pos]]]

This shows the result as a Dynamic:

Wolfram Language code:

Graphics3D[Point[
	Dynamic[Refresh[
	NBody[pos, vel, numParticles, deltaT, epsSqrt, 256 * 4, newPos, newVel, 1024];
	NBody[newPos, newVel, numParticles, deltaT, epsSqrt, 256 * 4, pos, vel, 1024];
	Take[#, 3]& /@ OpenCLMemoryGet[pos], UpdateInterval -> 0]]]]

Neat Examples (1)

SymbolicC (1)

OpenCLLink can use SymbolicC's code generation capabilities. To use SymbolicC, the user needs to load it:

Wolfram Language code: Needs["SymbolicC`"]

OpenCLLink can use SymbolicC's code generation capabilities; here a method toSymbolicC is defined that takes a Wolfram Language statement and translates it to a SymbolicC expression (it cannot translate all Wolfram Language commands, but they can be added by the user):

Wolfram Language code:

ClearAll[toSymbolicC]
SetAttributes[toSymbolicC, {HoldAll}]
toSymbolicC[x_List] := toSymbolicC /@ x
toSymbolicC[Times[-1, x_]] := "-" <> GenerateCode[toSymbolicC[x]]
toSymbolicC[(op : (Plus | Times))[args___]] := COperator[op, toSymbolicC[{args}]]
toSymbolicC[(op : (Minus | BitNot | Not | Decrement | Increment | PreDecrement | PreIncrement))[x_]] := COperator[op, toSymbolicC[x]]
toSymbolicC[(op : (Mod | Divide | Subtract | BitShiftRight | BitShiftLeft))[x_, y_]] := COperator[op, {toSymbolicC[x], toSymbolicC[y]}]
toSymbolicC[(op : (ArcCos | ArcSin | Ceiling | Cos | Cosh | Exp | Abs | Floor | Sin | Sinh | Sqrt | Tan | Tanh | Log))[x_]] := CStandardMathOperator[op, toSymbolicC[x]]
toSymbolicC[Power[x_, r : Rational[_, _]]] := CStandardMathOperator[Power, {toSymbolicC[x], toSymbolicC[r]}]
toSymbolicC[Power[x_, 2]] := COperator[Times, {toSymbolicC[x], toSymbolicC[x]}]
toSymbolicC[Power[x_, y_]] := CStandardMathOperator[Power, {toSymbolicC[x], toSymbolicC[y]}]
toSymbolicC[CompoundExpression[stmts__]] := toSymbolicC /@ stmts
toSymbolicC[If[cond_, trueStmt_]] := CIf[toSymbolicC[cond], toSymbolicC[trueStmt]]
toSymbolicC[If[cond_, trueStmt_, falseStmt_]] := CIf[toSymbolicC[cond], toSymbolicC[trueStmt], toSymbolicC[falseStmt]]
toSymbolicC[x_] := x

Wolfram Language expressions can be transformed:

Wolfram Language code: toSymbolicC[Cos[x] ^ 2 + Sin[x] * 2 + x ^ 8 + 3]

To translate to C, the user uses ToCCodeString:

Wolfram Language code: ToCCodeString[%]

You can tie this with OpenCLLink's symbolic code generation capabilities to create an OpenCLMapSource function:

Wolfram Language code:

SetAttributes[OpenCLMapSource, {HoldAll}];
OpenCLMapSource[f_] := ToCCodeString[With[{fun = f[xx] /. xx -> CArray["in", "index"]}, SymbolicOpenCLFunction["map", {{CPointerType[{"__global", "mint"}], "in"}, {CPointerType[{"__global", "mint"}], "out"}, {"int", "length"}}, 
	CBlock[{
	SymbolicOpenCLDeclareIndexBlock[1], 
	CIf[COperator[Less, {"index", "length"}], 
	CAssign[CArray["out", "index"], toSymbolicC[fun]]
	]
	}]
	]]]

OpenCLMapSource can work with pure Wolfram Language functions:

Wolfram Language code: OpenCLMapSource[# + Sin[#]&]

You can also use the code to work with predefined Wolfram Language functions:

Wolfram Language code: myFun[x_] := Cos[x] ^ 2 + Sin[x] * 2 + x ^ 8 + 3

Wolfram Language code: OpenCLMapSource[myFun]

The above code can then be loaded using OpenCLFunctionLoad:

Wolfram Language code: addTwo = OpenCLFunctionLoad[OpenCLMapSource[# + 2&], "map", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer}, 256]

The function can be evaluated:

Wolfram Language code: addTwo[ConstantArray[1, 100], ConstantArray[1, 100], 100]

To make this general, you can implement an OpenCLMap function:

Wolfram Language code:

SetAttributes[OpenCLMap, HoldFirst];
OpenCLMap[fun_, input_List] := 
	Module[{len = Length[input], res, output, oclFun}, 
	output = OpenCLMemoryAllocate[Integer, Length[input]];
	oclFun = OpenCLFunctionLoad[OpenCLMapSource[fun], "map", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer}, 256];
	oclFun[input, output, len];
	res = OpenCLMemoryGet[output];
	OpenCLMemoryUnload[output];
	res
	]

The function can be evaluated. Here, the addTwo function is implemented:

Wolfram Language code: OpenCLMap[# + 2&, ConstantArray[1, 100]]

Here, the BitNot operator is used:

Wolfram Language code: OpenCLMap[BitNot, ConstantArray[1, 100]]

Top

OpenCLFunctionLoad

Details and Options

Examples

Basic Examples (5)

Scope (2)

Templated Function (1)

Shared or Local Memory (1)

Applications (10)

Image Input (1)

Uniform Random Number Generation (1)

Random Number Generation Using the Mersenne Twister (1)

Prefix Sum Algorithm (1)

Matrix Operations (1)

Matrix Multiplication (1)

Fast Fourier Transform (1)

Financial Derivative (1)

Gaussian Filter (1)

Sorting (1)

Possible Issues (5)

Interactive Examples (5)

Mandelbrot Set (1)

Julia Set (1)

Image Adjustment (1)

Bouncing Ball (1)

N-Body Simulation (1)

Neat Examples (1)

SymbolicC (1)

See Also

Tech Notes

Related Guides

Related Links

Text

CMS

APA

BibTeX

BibLaTeX