Wolfram Language & System Documentation Center

CUDAMemoryAllocate

CUDALink`

CUDAMemoryAllocate

CUDAMemoryAllocate[type,dim]

gives CUDAMemory with specified type and single dimension.

CUDAMemoryAllocate[type,{dim₁,dim₂,…}]

gives CUDAMemory with specified type and dimensions.

Details and Options

The CUDALink application must be loaded using Needs["CUDALink`"].
Possible types for CUDAMemoryAllocate are:

Integer	Real	Complex
"Byte"	"Bit16"	"Integer32"
"Byte[2]"	"Bit16[2]"	"Integer32[2]"
"Byte[3]"	"Bit16[3]"	"Integer32[3]"
"Byte[4]"	"Bit16[4]"	"Integer32[4]"
"UnsignedByte"	"UnsignedBit16"	"UnsignedInteger"
"UnsignedByte[2]"	"UnsignedBit16[2]"	"UnsignedInteger[2]"
"UnsignedByte[3]"	"UnsignedBit16[3]"	"UnsignedInteger[3]"
"UnsignedByte[4]"	"UnsignedBit16[4]"	"UnsignedInteger[4]"
"Double"	"Float"	"Integer64"
"Double[2]"	"Float[2]"	"Integer64[2]"
"Double[3]"	"Float[3]"	"Integer64[3]"
"Double[4]"	"Float[4]"	"Integer64[4]"

The following options can be given:
"Device" $CUDADevice CUDA device used in computation

"TargetPrecision" Automatic precision used in computation

Examples

open all close all

Basic Examples (4)

First, load the CUDALink application:

Wolfram Language code: Needs["CUDALink`"]

This allocates a rank 3 tensor with each dimension 10:

Wolfram Language code: mem = CUDAMemoryAllocate[Integer, {10, 10, 10}]

Information about memory can be retrieved via CUDAMemoryInformation:

Wolfram Language code: CUDAMemoryInformation[mem]

This unloads the memory:

Wolfram Language code: CUDAMemoryUnload[mem]

For a single dimension, the length can be an integer:

Wolfram Language code: CUDAMemoryAllocate[Integer, 10]

Link CUDAMemoryLoad; different types are supported:

Wolfram Language code: CUDAMemoryAllocate["Float[4]", {4}]

Adding memory as Real or Complex gets the type based on whether the device supports double precision or not:

Wolfram Language code: CUDAMemoryAllocate[Real, 3]

In this case, the CUDA device has double-precision support:

Wolfram Language code: CUDAInformation[$CUDADevice, "Compute Capabilities"]

The behavior can be forced to change by setting the "TargetPrecision":

Wolfram Language code: CUDAMemoryAllocate[Real, 3, "TargetPrecision" -> "Single"]

Applications (1)

This sets all elements in a list to 0:

Wolfram Language code:

src = "__global__ void zero(mint * A, mint length) {
    int index = threadIdx.x + blockIdx.x*blockDim.x;
    if (index < length)
        A[index] = 0;
}";

This allocates the required memory:

Wolfram Language code:

len = 100;
mem = CUDAMemoryAllocate[Integer, {len}]

This loads the function:

Wolfram Language code: zero = CUDAFunctionLoad[src, "zero", {{_Integer}, _Integer}, 32]

This runs the function:

Wolfram Language code: zero[mem, len]

This shows information about the memory; note that the "DeviceStatus" is "Synchronized":

Wolfram Language code: CUDAMemoryInformation[mem]

This gets the memory from the GPU:

Wolfram Language code: CUDAMemoryGet[mem]

This shows information about the memory; note that the "DeviceStatus" and "HostStatus" are "Synchronized":

Wolfram Language code: CUDAMemoryInformation[mem]

Possible Issues (1)

Getting memory from the GPU for unset allocated memory returns random results:

Wolfram Language code: CUDAMemoryAllocate[Integer, {10}]//CUDAMemoryGet

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

CUDAMemoryAllocate

Details and Options

Examples

Basic Examples (4)

Applications (1)

Possible Issues (1)

Text

CMS

APA

BibTeX

BibLaTeX

	"Device"	$CUDADevice	CUDA device used in computation
	"TargetPrecision"	Automatic	precision used in computation

CUDAMemoryAllocate

Details and Options

Examples

Basic Examples (4)

Applications (1)

Possible Issues (1)

See Also

Tech Notes

Related Guides

Text

CMS

APA

BibTeX

BibLaTeX