cuDNN
Public
cuDNN.cuDNN — ModulecuDNNHigh level interface to cuDNN functions. See README.md for a design overview.
Private
cuDNN.cudnnDropoutSeed — ConstantcudnnDropoutForward(x; dropout=0.5)
cudnnDropoutForward(x, d::cudnnDropoutDescriptor)
cudnnDropoutForward!(y, x; dropout=0.5)
cudnnDropoutForward!(y, x, d::cudnnDropoutDescriptor)Return a new array similar to x where approximately dropout fraction of the values are replaced by a 0, and the rest are scaled by 1/(1-dropout). Optionally y holds the result and d specifies the operation. y should be similar to x if specified.
The user can set the global seed cudnnDropoutSeed[] to a positive number to always drop the same values deterministically for debugging. Note that this slows down the operation by about 40x.
The global constant cudnnDropoutState::Dict holds the random number generator state for each cuDNN handle.
cuDNN.cudnnDropoutState — ConstantcudnnDropoutForward(x; dropout=0.5)
cudnnDropoutForward(x, d::cudnnDropoutDescriptor)
cudnnDropoutForward!(y, x; dropout=0.5)
cudnnDropoutForward!(y, x, d::cudnnDropoutDescriptor)Return a new array similar to x where approximately dropout fraction of the values are replaced by a 0, and the rest are scaled by 1/(1-dropout). Optionally y holds the result and d specifies the operation. y should be similar to x if specified.
The user can set the global seed cudnnDropoutSeed[] to a positive number to always drop the same values deterministically for debugging. Note that this slows down the operation by about 40x.
The global constant cudnnDropoutState::Dict holds the random number generator state for each cuDNN handle.
cuDNN.cudnnActivationDescriptor — TypecudnnActivationDescriptor(mode::cudnnActivationMode_t,
reluNanOpt::cudnnNanPropagation_t,
coef::Cfloat)cuDNN.cudnnAttnDescriptor — TypecudnnAttnDescriptor(attnMode::Cuint,
nHeads::Cint,
smScaler::Cdouble,
dataType::cudnnDataType_t,
computePrec::cudnnDataType_t,
mathType::cudnnMathType_t,
attnDropoutDesc::cudnnDropoutDescriptor_t,
postDropoutDesc::cudnnDropoutDescriptor_t,
qSize::Cint,
kSize::Cint,
vSize::Cint,
qProjSize::Cint,
kProjSize::Cint,
vProjSize::Cint,
oProjSize::Cint,
qoMaxSeqLength::Cint,
kvMaxSeqLength::Cint,
maxBatchSize::Cint,
maxBeamSize::Cint)cuDNN.cudnnCTCLossDescriptor — TypecudnnCTCLossDescriptor(compType::cudnnDataType_t,
normMode::cudnnLossNormalizationMode_t,
gradMode::cudnnNanPropagation_t,
maxLabelLength::Cint)cuDNN.cudnnConvolutionDescriptor — TypecudnnConvolutionDescriptor(pad::Vector{Cint}, stride::Vector{Cint}, dilation::Vector{Cint}, mode::cudnnConvolutionModet, dataType::cudnnDataTypet, groupCount::Cint, mathType::cudnnMathTypet, reorderType::cudnnReorderTypet)
cuDNN.cudnnDropoutDescriptor — TypecudnnDropoutDescriptor(dropout::Real)cuDNN.cudnnFilterDescriptor — TypecudnnFilterDescriptor(dataType::cudnnDataType_t,
format::cudnnTensorFormat_t,
nbDims::Cint,
filterDimA::Vector{Cint})cuDNN.cudnnLRNDescriptor — TypecudnnLRNDescriptor(lrnN::Cuint,
lrnAlpha::Cdouble,
lrnBeta::Cdouble,
lrnK::Cdouble)cuDNN.cudnnOpTensorDescriptor — TypecudnnOpTensorDescriptor(opTensorOp::cudnnOpTensorOp_t,
opTensorCompType::cudnnDataType_t,
opTensorNanOpt::cudnnNanPropagation_t)cuDNN.cudnnPoolingDescriptor — TypecudnnPoolingDescriptor(mode::cudnnPoolingMode_t,
maxpoolingNanOpt::cudnnNanPropagation_t,
nbDims::Cint,
windowDimA::Vector{Cint},
paddingA::Vector{Cint},
strideA::Vector{Cint})cuDNN.cudnnRNNDataDescriptor — TypecudnnRNNDataDescriptor(dataType::cudnnDataType_t,
layout::cudnnRNNDataLayout_t,
maxSeqLength::Cint,
batchSize::Cint,
vectorSize::Cint,
seqLengthArray::Vector{Cint},
paddingFill::Ptr{Cvoid})cuDNN.cudnnRNNDescriptor — TypecudnnRNNDescriptor(algo::cudnnRNNAlgo_t,
cellMode::cudnnRNNMode_t,
biasMode::cudnnRNNBiasMode_t,
dirMode::cudnnDirectionMode_t,
inputMode::cudnnRNNInputMode_t,
dataType::cudnnDataType_t,
mathPrec::cudnnDataType_t,
mathType::cudnnMathType_t,
inputSize::Int32,
hiddenSize::Int32,
projSize::Int32,
numLayers::Int32,
dropoutDesc::cudnnDropoutDescriptor_t,
auxFlags::UInt32)cuDNN.cudnnReduceTensorDescriptor — TypecudnnReduceTensorDescriptor(reduceTensorOp::cudnnReduceTensorOp_t,
reduceTensorCompType::cudnnDataType_t,
reduceTensorNanOpt::cudnnNanPropagation_t,
reduceTensorIndices::cudnnReduceTensorIndices_t,
reduceTensorIndicesType::cudnnIndicesType_t)cuDNN.cudnnSeqDataDescriptor — TypecudnnSeqDataDescriptor(dataType::cudnnDataType_t,
nbDims::Cint,
dimA::Vector{Cint},
axes::Vector{cudnnSeqDataAxis_t},
seqLengthArraySize::Csize_t,
seqLengthArray::Vector{Cint},
paddingFill::Ptr{Cvoid})cuDNN.cudnnSpatialTransformerDescriptor — TypecudnnSpatialTransformerDescriptor(samplerType::cudnnSamplerType_t,
dataType::cudnnDataType_t,
nbDims::Cint,
dimA::Vector{Cint})cuDNN.cudnnTensorDescriptor — TypecudnnTensorDescriptor(format::cudnnTensorFormat_t,
dataType::cudnnDataType_t,
nbDims::Cint,
dimA::Vector{Cint})cuDNN.cudnnTensorTransformDescriptor — TypecudnnTensorTransformDescriptor(nbDims::UInt32,
destFormat::cudnnTensorFormat_t,
padBeforeA::Vector{Int32},
padAfterA::Vector{Int32},
foldA::Vector{UInt32},
direction::cudnnFoldingDirection_t)cuDNN.cudnnActivationForward — FunctioncudnnActivationForward(x; mode, nanOpt, coef, alpha)
cudnnActivationForward(x, d::cudnnActivationDescriptor; alpha)
cudnnActivationForward!(y, x; mode, nanOpt, coef, alpha, beta)
cudnnActivationForward!(y, x, d::cudnnActivationDescriptor; alpha, beta)Return the result of the specified elementwise activation operation applied to x. Optionally y holds the result and d specifies the operation. y should be similar to x if specified. Keyword arguments alpha=1, beta=0 can be used for scaling, i.e. y .= alpha * op.(x) .+ beta * y. The following keyword arguments specify the operation if d is not given:
mode = CUDNN_ACTIVATION_RELU: Options areSIGMOID,RELU,TANH,CLIPPED_RELU,ELU,IDENTITYnanOpt = CUDNN_NOT_PROPAGATE_NAN: NaN propagation policy, the other option isCUDNN_PROPAGATE_NANcoef=1: When the activation mode is set toCUDNN_ACTIVATION_CLIPPED_RELU, this input specifies the clipping threshold; and when the activation mode is set toCUDNN_ACTIVATION_ELU, this input specifies thealphaparameter.
cuDNN.cudnnActivationForward! — FunctioncudnnActivationForward(x; mode, nanOpt, coef, alpha)
cudnnActivationForward(x, d::cudnnActivationDescriptor; alpha)
cudnnActivationForward!(y, x; mode, nanOpt, coef, alpha, beta)
cudnnActivationForward!(y, x, d::cudnnActivationDescriptor; alpha, beta)Return the result of the specified elementwise activation operation applied to x. Optionally y holds the result and d specifies the operation. y should be similar to x if specified. Keyword arguments alpha=1, beta=0 can be used for scaling, i.e. y .= alpha * op.(x) .+ beta * y. The following keyword arguments specify the operation if d is not given:
mode = CUDNN_ACTIVATION_RELU: Options areSIGMOID,RELU,TANH,CLIPPED_RELU,ELU,IDENTITYnanOpt = CUDNN_NOT_PROPAGATE_NAN: NaN propagation policy, the other option isCUDNN_PROPAGATE_NANcoef=1: When the activation mode is set toCUDNN_ACTIVATION_CLIPPED_RELU, this input specifies the clipping threshold; and when the activation mode is set toCUDNN_ACTIVATION_ELU, this input specifies thealphaparameter.
cuDNN.cudnnAddTensor — FunctioncudnnAddTensor(x, b; alpha)
cudnnAddTensor!(y, x, b; alpha, beta)Broadcast-add tensor b to tensor x. alpha=1, beta=1 are used for scaling, i.e. y .= alpha * b .+ beta * x. cudnnAddTensor allocates a new array for the answer, cudnnAddTensor! overwrites y. Does not support all valid broadcasting dimensions. For more flexible broadcast operations see cudnnOpTensor.
cuDNN.cudnnAddTensor! — FunctioncudnnAddTensor(x, b; alpha)
cudnnAddTensor!(y, x, b; alpha, beta)Broadcast-add tensor b to tensor x. alpha=1, beta=1 are used for scaling, i.e. y .= alpha * b .+ beta * x. cudnnAddTensor allocates a new array for the answer, cudnnAddTensor! overwrites y. Does not support all valid broadcasting dimensions. For more flexible broadcast operations see cudnnOpTensor.
cuDNN.cudnnConvolutionBwdDataAlgoPerf — FunctioncudnnConvolutionBwdDataAlgoPerf(wDesc, w, dyDesc, dy, convDesc, dxDesc, dx, allocateTmpBuf=true)allocateTmpBuf controls whether a temporary buffer is allocated for the input gradient dx. It can be set to false when beta is zero to save an allocation and must otherwise be set to true.
cuDNN.cudnnConvolutionBwdFilterAlgoPerf — FunctioncudnnConvolutionBwdFilterAlgoPerf(xDesc, x, dyDesc, dy, convDesc, dwDesc, dw, allocateTmpBuf=true)allocateTmpBuf controls whether a temporary buffer is allocated for the weight gradient dw. It can be set to false when beta is zero to save an allocation and must otherwise be set to true.
cuDNN.cudnnConvolutionForward — FunctioncudnnConvolutionForward(w, x; bias, activation, mode, padding, stride, dilation, group, mathType, reorderType, alpha, beta, z, format)
cudnnConvolutionForward(w, x, d::cudnnConvolutionDescriptor; bias, activation, alpha, beta, z, format)
cudnnConvolutionForward!(y, w, x; bias, activation, mode, padding, stride, dilation, group, mathType, reorderType, alpha, beta, z, format)
cudnnConvolutionForward!(y, w, x, d::cudnnConvolutionDescriptor; bias, activation, alpha, beta, z, format)Return the convolution of filter w with tensor x, overwriting y if provided, according to keyword arguments or the convolution descriptor d. Optionally perform bias addition, activation and/or scaling:
y .= activation.(alpha * conv(w,x) + beta * z .+ bias)All tensors should have the same number of dimensions. If they are less than 4-D their dimensions are assumed to be padded on the left with ones. x has size (X...,Cx,N) where (X...) are the spatial dimensions, Cx is the number of input channels, and N is the number of instances. y,z have size (Y...,Cy,N) where (Y...) are the spatial dimensions and Cy is the number of output channels (y and z can be the same array). Both Cx and Cy have to be an exact multiple of group. w has size (W...,Cx÷group,Cy) where (W...) are the filter dimensions. bias has size (1...,Cy,1).
The arguments padding, stride and dilation can be specified as n-2 dimensional vectors, tuples or a single integer which is assumed to be repeated n-2 times. If any of the entries is larger than the corresponding x dimension, the x dimension is used instead. For a description of different types of convolution see: https://towardsdatascience.com/a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215
Keyword arguments:
activation = CUDNN_ACTIVATION_IDENTITY: the only other supported option isCUDNN_ACTIVATION_RELUbias = nothing: add bias if providedz = nothing: addbeta*z,zcan benothing,yor another array similar toyalpha = 1, beta = 0: scaling parametersformat = CUDNN_TENSOR_NCHW: order of tensor dimensions, the other alternative isCUDNN_TENSOR_NHWC. Note that Julia dimensions will have the opposite order, i.e. WHCN or CWHN.
Keyword arguments describing the convolution when d is not given:
mode = CUDNN_CONVOLUTION: alternativelyCUDNN_CROSS_CORRELATIONpadding = 0: padding assumed aroundxstride = 1: how far to shift the convolution window at each stepdilation = 1: dilation factorgroup = 1: number of groups to be usedmathType = cuDNN.math_mode(): whether or not the use of tensor op is permittedreorderType = CUDNN_DEFAULT_REORDER: convolution reorder type
cuDNN.cudnnConvolutionForward! — FunctioncudnnConvolutionForward(w, x; bias, activation, mode, padding, stride, dilation, group, mathType, reorderType, alpha, beta, z, format)
cudnnConvolutionForward(w, x, d::cudnnConvolutionDescriptor; bias, activation, alpha, beta, z, format)
cudnnConvolutionForward!(y, w, x; bias, activation, mode, padding, stride, dilation, group, mathType, reorderType, alpha, beta, z, format)
cudnnConvolutionForward!(y, w, x, d::cudnnConvolutionDescriptor; bias, activation, alpha, beta, z, format)Return the convolution of filter w with tensor x, overwriting y if provided, according to keyword arguments or the convolution descriptor d. Optionally perform bias addition, activation and/or scaling:
y .= activation.(alpha * conv(w,x) + beta * z .+ bias)All tensors should have the same number of dimensions. If they are less than 4-D their dimensions are assumed to be padded on the left with ones. x has size (X...,Cx,N) where (X...) are the spatial dimensions, Cx is the number of input channels, and N is the number of instances. y,z have size (Y...,Cy,N) where (Y...) are the spatial dimensions and Cy is the number of output channels (y and z can be the same array). Both Cx and Cy have to be an exact multiple of group. w has size (W...,Cx÷group,Cy) where (W...) are the filter dimensions. bias has size (1...,Cy,1).
The arguments padding, stride and dilation can be specified as n-2 dimensional vectors, tuples or a single integer which is assumed to be repeated n-2 times. If any of the entries is larger than the corresponding x dimension, the x dimension is used instead. For a description of different types of convolution see: https://towardsdatascience.com/a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215
Keyword arguments:
activation = CUDNN_ACTIVATION_IDENTITY: the only other supported option isCUDNN_ACTIVATION_RELUbias = nothing: add bias if providedz = nothing: addbeta*z,zcan benothing,yor another array similar toyalpha = 1, beta = 0: scaling parametersformat = CUDNN_TENSOR_NCHW: order of tensor dimensions, the other alternative isCUDNN_TENSOR_NHWC. Note that Julia dimensions will have the opposite order, i.e. WHCN or CWHN.
Keyword arguments describing the convolution when d is not given:
mode = CUDNN_CONVOLUTION: alternativelyCUDNN_CROSS_CORRELATIONpadding = 0: padding assumed aroundxstride = 1: how far to shift the convolution window at each stepdilation = 1: dilation factorgroup = 1: number of groups to be usedmathType = cuDNN.math_mode(): whether or not the use of tensor op is permittedreorderType = CUDNN_DEFAULT_REORDER: convolution reorder type
cuDNN.cudnnConvolutionFwdAlgoPerf — FunctioncudnnConvolutionFwdAlgoPerf(xDesc, x, wDesc, w, convDesc, yDesc, y, biasDesc, activation, allocateTmpBuf=true)allocateTmpBuf controls whether a temporary buffer is allocated for the output y. It can be set to false when beta is zero to save an allocation and must otherwise be set to true.
cuDNN.cudnnDropoutForward — FunctioncudnnDropoutForward(x; dropout=0.5)
cudnnDropoutForward(x, d::cudnnDropoutDescriptor)
cudnnDropoutForward!(y, x; dropout=0.5)
cudnnDropoutForward!(y, x, d::cudnnDropoutDescriptor)Return a new array similar to x where approximately dropout fraction of the values are replaced by a 0, and the rest are scaled by 1/(1-dropout). Optionally y holds the result and d specifies the operation. y should be similar to x if specified.
The user can set the global seed cudnnDropoutSeed[] to a positive number to always drop the same values deterministically for debugging. Note that this slows down the operation by about 40x.
The global constant cudnnDropoutState::Dict holds the random number generator state for each cuDNN handle.
cuDNN.cudnnDropoutForward! — FunctioncudnnDropoutForward(x; dropout=0.5)
cudnnDropoutForward(x, d::cudnnDropoutDescriptor)
cudnnDropoutForward!(y, x; dropout=0.5)
cudnnDropoutForward!(y, x, d::cudnnDropoutDescriptor)Return a new array similar to x where approximately dropout fraction of the values are replaced by a 0, and the rest are scaled by 1/(1-dropout). Optionally y holds the result and d specifies the operation. y should be similar to x if specified.
The user can set the global seed cudnnDropoutSeed[] to a positive number to always drop the same values deterministically for debugging. Note that this slows down the operation by about 40x.
The global constant cudnnDropoutState::Dict holds the random number generator state for each cuDNN handle.
cuDNN.cudnnGetRNNWeightParams — MethodcudnnGetRNNWeightParams(w, d::cudnnRNNDescriptor)
cudnnGetRNNWeightParams(w; hiddenSize, o...)Return an array of weight matrices and bias vectors of an RNN specified by d or keyword options as views into w. The keyword arguments and defaults in the second form are the same as those in cudnnRNNForward specifying the RNN.
In the returned array a[1,l,p] and a[2,l,p] give the weight matrix and bias vector for the l'th layer and p'th parameter or nothing if the specified matrix/vector does not exist. Note that the matrices should be transposed for left multiplication, e.g. `a[1,l,p]'
- x`
The l index refers to the pseudo-layer number. In uni-directional RNNs, a pseudo-layer is the same as a physical layer (pseudoLayer=1 is the RNN input layer, pseudoLayer=2 is the first hidden layer). In bi-directional RNNs, there are twice as many pseudo-layers in comparison to physical layers:
pseudoLayer=1 refers to the forward direction sub-layer of the physical input layer
pseudoLayer=2 refers to the backward direction sub-layer of the physical input layer
pseudoLayer=3 is the forward direction sub-layer of the first hidden layer, and so onThe p index refers to the weight matrix or bias vector linear ID index.
If cellMode in rnnDesc was set to CUDNNRNNRELU or CUDNNRNNTANH:
Value 1 references the weight matrix or bias vector used in conjunction with the input from the previous layer or input to the RNN model.
Value 2 references the weight matrix or bias vector used in conjunction with the hidden state from the previous time step or the initial hidden state.If cellMode in rnnDesc was set to CUDNN_LSTM:
Values 1, 2, 3 and 4 reference weight matrices or bias vectors used in conjunction with the input from the previous layer or input to the RNN model.
Values 5, 6, 7 and 8 reference weight matrices or bias vectors used in conjunction with the hidden state from the previous time step or the initial hidden state.
Value 9 corresponds to the projection matrix, if enabled (there is no bias in this operation).Values and their LSTM gates:
Values 1 and 5 correspond to the input gate.
Values 2 and 6 correspond to the forget gate.
Values 3 and 7 correspond to the new cell state calculations with hyperbolic tangent.
Values 4 and 8 correspond to the output gate.If cellMode in rnnDesc was set to CUDNN_GRU:
Values 1, 2 and 3 reference weight matrices or bias vectors used in conjunction with the input from the previous layer or input to the RNN model.
Values 4, 5 and 6 reference weight matrices or bias vectors used in conjunction with the hidden state from the previous time step or the initial hidden state.Values and their GRU gates:
Values 1 and 4 correspond to the reset gate.
Values 2 and 5 reference to the update gate.
Values 3 and 6 correspond to the new hidden state calculations with hyperbolic tangent.cuDNN.cudnnMultiHeadAttnForward — FunctioncudnnMultiHeadAttnForward(weights, queries, keys, values; o...)
cudnnMultiHeadAttnForward!(out, weights, queries, keys, values; o...)
cudnnMultiHeadAttnForward(weights, queries, keys, values, d::cudnnAttnDescriptor; o...)
cudnnMultiHeadAttnForward!(out, weights, queries, keys, values, d::cudnnAttnDescriptor; o...)Return the multi-head attention result with weights, queries, keys, and values, overwriting out if provided, according to keyword arguments or the attention descriptor d. The multi-head attention model can be described by the following equations:
\[\begin{aligned} &h_i = (W_{V,i} V) \operatorname{softmax}(\operatorname{smScaler}(K^T W^T_{K,i}) (W_{Q,i} q)) &\operatorname(MultiHeadAttn)(q,K,V,W_Q,W_K,W_V,W_O) = \sum_{i=1}^{\operatorname{nHeads}-1} W_{O,i} h_i \end{aligned}\]
The input arguments are:
out: Optional output tensor.weights: A weight buffer that contains $W_Q, W_K, W_V, W_O$.queries: A query tensor $Q$ which may contain a batch of queries (the above equations were for a single query vector $q$ for simplicity).keys: The keys tensor $K$.values: The values tensor $V$.
Keyword arguments describing the tensors:
axes::Vector{cudnnSeqDataAxis_t} = [CUDNN_SEQDATA_VECT_DIM, CUDNN_SEQDATA_BATCH_DIM, CUDNN_SEQDATA_TIME_DIM, CUDNN_SEQDATA_BEAM_DIM]: an array of length 4 that specifies the role of (Julia) dimensions. VECT has to be the first dimension, all 6 permutations of the remaining three are supported.seqLengthsQO::Vector{<:Integer}: sequence lengths in the queries and out containers. By default sequences are assumed to be full length of the TIME dimension.seqLengthsKV::Vector{<:Integer}: sequence lengths in the keys and values containers. By default sequences are assumed to be full length of the TIME dimension.
Keyword arguments describing the attention operation when d is not given:
attnMode::Unsigned = CUDNN_ATTN_QUERYMAP_ALL_TO_ONE | CUDNN_ATTN_DISABLE_PROJ_BIASES: bitwise flags indicating various attention options. See cudnn docs for details.nHeads::Integer = 1: number of attention heads.smScaler::Real = 1: softmax smoothing (1.0 >= smScaler >= 0.0) or sharpening (smScaler > 1.0) coefficient. Negative values are not accepted.mathType::cudnnMathType_t = math_mode(): NVIDIA Tensor Core settings.qProjSize, kProjSize, vProjSize, oProjSize: vector lengths after projections, set to 0 by default which disables projections.qoMaxSeqLength::Integer: largest sequence length expected in queries and out, set to their TIME dim by default.kvMaxSeqLength::Integer: largest sequence length expected in keys and values, set to their TIME dim by default.maxBatchSize::Integer: largest batch size expected in any container, set to the BATCH dim of queries by default.maxBeamSize::Integer: largest beam size expected in any container, set to the BEAM dim of queries by default.
Other keyword arguments:
residuals = nothing: optional tensor with the same size as queries that can be used to implement residual connections (see figure in cudnn docs). When residual connections are enabled, the vector length inqueriesshould match the vector length inout, so that a vector addition is feasible.currIdx::Integer = -1: Time-step (0-based) in queries to process. When thecurrIdxargument is negative, all $Q$ time-steps are processed. WhencurrIdxis zero or positive, the forward response is computed for the selected time-step only. The latter input can be used in inference mode only, to process one time-step while updating the next attention window and $Q$, $K$, $V$ inputs in-between calls.loWinIdx, hiWinIdx::Array{Cint}: Two host integer arrays specifying the start and end (0-based) indices of the attention window for each $Q$ time-step. The start index in $K$, $V$ sets is inclusive, and the end index is exclusive. By default set at 0 andkvMaxSeqLengthrespectively.
cuDNN.cudnnMultiHeadAttnForward! — FunctioncudnnMultiHeadAttnForward(weights, queries, keys, values; o...)
cudnnMultiHeadAttnForward!(out, weights, queries, keys, values; o...)
cudnnMultiHeadAttnForward(weights, queries, keys, values, d::cudnnAttnDescriptor; o...)
cudnnMultiHeadAttnForward!(out, weights, queries, keys, values, d::cudnnAttnDescriptor; o...)Return the multi-head attention result with weights, queries, keys, and values, overwriting out if provided, according to keyword arguments or the attention descriptor d. The multi-head attention model can be described by the following equations:
\[\begin{aligned} &h_i = (W_{V,i} V) \operatorname{softmax}(\operatorname{smScaler}(K^T W^T_{K,i}) (W_{Q,i} q)) &\operatorname(MultiHeadAttn)(q,K,V,W_Q,W_K,W_V,W_O) = \sum_{i=1}^{\operatorname{nHeads}-1} W_{O,i} h_i \end{aligned}\]
The input arguments are:
out: Optional output tensor.weights: A weight buffer that contains $W_Q, W_K, W_V, W_O$.queries: A query tensor $Q$ which may contain a batch of queries (the above equations were for a single query vector $q$ for simplicity).keys: The keys tensor $K$.values: The values tensor $V$.
Keyword arguments describing the tensors:
axes::Vector{cudnnSeqDataAxis_t} = [CUDNN_SEQDATA_VECT_DIM, CUDNN_SEQDATA_BATCH_DIM, CUDNN_SEQDATA_TIME_DIM, CUDNN_SEQDATA_BEAM_DIM]: an array of length 4 that specifies the role of (Julia) dimensions. VECT has to be the first dimension, all 6 permutations of the remaining three are supported.seqLengthsQO::Vector{<:Integer}: sequence lengths in the queries and out containers. By default sequences are assumed to be full length of the TIME dimension.seqLengthsKV::Vector{<:Integer}: sequence lengths in the keys and values containers. By default sequences are assumed to be full length of the TIME dimension.
Keyword arguments describing the attention operation when d is not given:
attnMode::Unsigned = CUDNN_ATTN_QUERYMAP_ALL_TO_ONE | CUDNN_ATTN_DISABLE_PROJ_BIASES: bitwise flags indicating various attention options. See cudnn docs for details.nHeads::Integer = 1: number of attention heads.smScaler::Real = 1: softmax smoothing (1.0 >= smScaler >= 0.0) or sharpening (smScaler > 1.0) coefficient. Negative values are not accepted.mathType::cudnnMathType_t = math_mode(): NVIDIA Tensor Core settings.qProjSize, kProjSize, vProjSize, oProjSize: vector lengths after projections, set to 0 by default which disables projections.qoMaxSeqLength::Integer: largest sequence length expected in queries and out, set to their TIME dim by default.kvMaxSeqLength::Integer: largest sequence length expected in keys and values, set to their TIME dim by default.maxBatchSize::Integer: largest batch size expected in any container, set to the BATCH dim of queries by default.maxBeamSize::Integer: largest beam size expected in any container, set to the BEAM dim of queries by default.
Other keyword arguments:
residuals = nothing: optional tensor with the same size as queries that can be used to implement residual connections (see figure in cudnn docs). When residual connections are enabled, the vector length inqueriesshould match the vector length inout, so that a vector addition is feasible.currIdx::Integer = -1: Time-step (0-based) in queries to process. When thecurrIdxargument is negative, all $Q$ time-steps are processed. WhencurrIdxis zero or positive, the forward response is computed for the selected time-step only. The latter input can be used in inference mode only, to process one time-step while updating the next attention window and $Q$, $K$, $V$ inputs in-between calls.loWinIdx, hiWinIdx::Array{Cint}: Two host integer arrays specifying the start and end (0-based) indices of the attention window for each $Q$ time-step. The start index in $K$, $V$ sets is inclusive, and the end index is exclusive. By default set at 0 andkvMaxSeqLengthrespectively.
cuDNN.cudnnNormalizationForward — FunctioncudnnNormalizationForward(x, xmean, xvar, bias, scale; o...)
cudnnNormalizationForward!(y, x, xmean, xvar, bias, scale; o...)Return batch normalization applied to x:
y .= ((x .- mean(x; dims)) ./ sqrt.(epsilon .+ var(x; dims))) .* scale .+ bias # training
y .= ((x .- xmean) ./ sqrt.(epsilon .+ xvar)) .* scale .+ bias # inferencebias and scale are trainable parameters, xmean and xvar are modified to collect statistics during training and treated as constants during inference. Note that during inference the values given by xmean and xvar arguments are used in the formula whereas during training the actual mean and variance of the minibatch are used in the formula: the xmean/xvar arguments are only used to collect statistics. In the original paper bias is referred to as beta and scale as gamma (Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, S. Ioffe, C. Szegedy, 2015).
Keyword arguments:
epsilon = 1e-5: epsilon value used in the normalization formulaexponentialAverageFactor = 0.1: factor used in running mean/variance calculation:runningMean = runningMean*(1-factor) + newMean*factortraining = false: boolean indicating training vs inference modemode::cudnnNormMode_t = CUDNN_NORM_PER_CHANNEL: Per-channel layer is based on the paper. In this modescaleetc. have dimensions(1,1,C,1). The other alternative isCUDNN_NORM_PER_ACTIVATIONwherescaleetc. have dimensions(W,H,C,1).algo::cudnnNormAlgo_t = CUDNN_NORM_ALGO_STANDARD: The other alternative,CUDNN_NORM_ALGO_PERSIST, triggers the new semi-persistent NHWC kernel when certain conditions are met (see cudnn docs).normOps::cudnnNormOps_t = CUDNN_NORM_OPS_NORM: Currently the other alternatives,CUDNN_NORM_OPS_NORM_ACTIVATIONandCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONare not supported.z = nothing: for residual addition to the result of the normalization operation, prior to the activation (will be supported whenCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONis supported)groupCnt = 1: Place holder for future work, should be set to 1 nowalpha = 1; beta = 0: scaling parameters: returnalpha * new_y + beta * old_y
cuDNN.cudnnNormalizationForward! — FunctioncudnnNormalizationForward(x, xmean, xvar, bias, scale; o...)
cudnnNormalizationForward!(y, x, xmean, xvar, bias, scale; o...)Return batch normalization applied to x:
y .= ((x .- mean(x; dims)) ./ sqrt.(epsilon .+ var(x; dims))) .* scale .+ bias # training
y .= ((x .- xmean) ./ sqrt.(epsilon .+ xvar)) .* scale .+ bias # inferencebias and scale are trainable parameters, xmean and xvar are modified to collect statistics during training and treated as constants during inference. Note that during inference the values given by xmean and xvar arguments are used in the formula whereas during training the actual mean and variance of the minibatch are used in the formula: the xmean/xvar arguments are only used to collect statistics. In the original paper bias is referred to as beta and scale as gamma (Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, S. Ioffe, C. Szegedy, 2015).
Keyword arguments:
epsilon = 1e-5: epsilon value used in the normalization formulaexponentialAverageFactor = 0.1: factor used in running mean/variance calculation:runningMean = runningMean*(1-factor) + newMean*factortraining = false: boolean indicating training vs inference modemode::cudnnNormMode_t = CUDNN_NORM_PER_CHANNEL: Per-channel layer is based on the paper. In this modescaleetc. have dimensions(1,1,C,1). The other alternative isCUDNN_NORM_PER_ACTIVATIONwherescaleetc. have dimensions(W,H,C,1).algo::cudnnNormAlgo_t = CUDNN_NORM_ALGO_STANDARD: The other alternative,CUDNN_NORM_ALGO_PERSIST, triggers the new semi-persistent NHWC kernel when certain conditions are met (see cudnn docs).normOps::cudnnNormOps_t = CUDNN_NORM_OPS_NORM: Currently the other alternatives,CUDNN_NORM_OPS_NORM_ACTIVATIONandCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONare not supported.z = nothing: for residual addition to the result of the normalization operation, prior to the activation (will be supported whenCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONis supported)groupCnt = 1: Place holder for future work, should be set to 1 nowalpha = 1; beta = 0: scaling parameters: returnalpha * new_y + beta * old_y
cuDNN.cudnnOpTensor — FunctioncudnnOpTensor(x1, x2; op, compType, nanOpt, alpha1, alpha2)
cudnnOpTensor(x1, x2, d::cudnnOpTensorDescriptor; alpha1, alpha2)
cudnnOpTensor!(y, x1, x2; op, compType, nanOpt, alpha1, alpha2, beta)
cudnnOpTensor!(y, x1, x2, d::cudnnOpTensorDescriptor; alpha1, alpha2, beta)Return the result of the specified broadcasting operation applied to x1 and x2. Optionally y holds the result and d specifies the operation. Each dimension of the input tensor x1 must match the corresponding dimension of the destination tensor y, and each dimension of the input tensor x2 must match the corresponding dimension of the destination tensor y or must be equal to 1. Keyword arguments:
alpha1=1, alpha2=1, beta=0are used for scaling, i.e.y .= beta*y .+ op.(alpha1*x1, alpha2*x2)
Keyword arguments used when cudnnOpTensorDescriptor is not specified:
op = CUDNN_OP_TENSOR_ADD,ADDcan be replaced withMUL,MIN,MAX,SQRT,NOT;SQRTandNOTperformed only onx1;NOTcomputes1-x1compType = (eltype(x1) <: Float64 ? Float64 : Float32): Computation datatype (see cudnn docs for available options)nanOpt = CUDNN_NOT_PROPAGATE_NAN: NaN propagation policy. The other option isCUDNN_PROPAGATE_NAN.
cuDNN.cudnnOpTensor! — FunctioncudnnOpTensor(x1, x2; op, compType, nanOpt, alpha1, alpha2)
cudnnOpTensor(x1, x2, d::cudnnOpTensorDescriptor; alpha1, alpha2)
cudnnOpTensor!(y, x1, x2; op, compType, nanOpt, alpha1, alpha2, beta)
cudnnOpTensor!(y, x1, x2, d::cudnnOpTensorDescriptor; alpha1, alpha2, beta)Return the result of the specified broadcasting operation applied to x1 and x2. Optionally y holds the result and d specifies the operation. Each dimension of the input tensor x1 must match the corresponding dimension of the destination tensor y, and each dimension of the input tensor x2 must match the corresponding dimension of the destination tensor y or must be equal to 1. Keyword arguments:
alpha1=1, alpha2=1, beta=0are used for scaling, i.e.y .= beta*y .+ op.(alpha1*x1, alpha2*x2)
Keyword arguments used when cudnnOpTensorDescriptor is not specified:
op = CUDNN_OP_TENSOR_ADD,ADDcan be replaced withMUL,MIN,MAX,SQRT,NOT;SQRTandNOTperformed only onx1;NOTcomputes1-x1compType = (eltype(x1) <: Float64 ? Float64 : Float32): Computation datatype (see cudnn docs for available options)nanOpt = CUDNN_NOT_PROPAGATE_NAN: NaN propagation policy. The other option isCUDNN_PROPAGATE_NAN.
cuDNN.cudnnPoolingForward — FunctioncudnnPoolingForward(x; mode, nanOpt, window, padding, stride, alpha)
cudnnPoolingForward(x, d::cudnnPoolingDescriptor; alpha)
cudnnPoolingForward!(y, x; mode, nanOpt, window, padding, stride, alpha, beta)
cudnnPoolingForward!(y, x, d::cudnnPoolingDescriptor; alpha, beta)Return pooled x, overwriting y if provided, according to keyword arguments or the pooling descriptor d. Please see the cuDNN docs for details.
The dimensions of x,y tensors that are less than 4-D are assumed to be padded on the left with 1's. The first n-2 are spatial dimensions, the last two are always assumed to be channel and batch.
The arguments window, padding, and stride can be specified as n-2 dimensional vectors, tuples or a single integer which is assumed to be repeated n-2 times. If any of the entries is larger than the corresponding x dimension, the x dimension is used instead.
Arguments:
mode = CUDNN_POOLING_MAX: Pooling method, other options areCUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING,CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING,CUDNN_POOLING_MAX_DETERMINISTICnanOpt = CUDNN_NOT_PROPAGATE_NAN: NaN propagation policy, the other option isCUDNN_PROPAGATE_NANwindow = 2: Pooling window sizepadding = 0: Padding assumed aroundxstride = window: How far to shift pooling window at each stepalpha=1, beta=0can be used for scaling, i.e.y .= alpha * op(x1) .+ beta * y
cuDNN.cudnnPoolingForward! — FunctioncudnnPoolingForward(x; mode, nanOpt, window, padding, stride, alpha)
cudnnPoolingForward(x, d::cudnnPoolingDescriptor; alpha)
cudnnPoolingForward!(y, x; mode, nanOpt, window, padding, stride, alpha, beta)
cudnnPoolingForward!(y, x, d::cudnnPoolingDescriptor; alpha, beta)Return pooled x, overwriting y if provided, according to keyword arguments or the pooling descriptor d. Please see the cuDNN docs for details.
The dimensions of x,y tensors that are less than 4-D are assumed to be padded on the left with 1's. The first n-2 are spatial dimensions, the last two are always assumed to be channel and batch.
The arguments window, padding, and stride can be specified as n-2 dimensional vectors, tuples or a single integer which is assumed to be repeated n-2 times. If any of the entries is larger than the corresponding x dimension, the x dimension is used instead.
Arguments:
mode = CUDNN_POOLING_MAX: Pooling method, other options areCUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING,CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING,CUDNN_POOLING_MAX_DETERMINISTICnanOpt = CUDNN_NOT_PROPAGATE_NAN: NaN propagation policy, the other option isCUDNN_PROPAGATE_NANwindow = 2: Pooling window sizepadding = 0: Padding assumed aroundxstride = window: How far to shift pooling window at each stepalpha=1, beta=0can be used for scaling, i.e.y .= alpha * op(x1) .+ beta * y
cuDNN.cudnnRNNForward — FunctioncudnnRNNForward(w, x; hiddenSize, o...)
cudnnRNNForward!(y, w, x; hiddenSize, o...)
cudnnRNNForward(w, x, d::cudnnRNNDescriptor; o...)
cudnnRNNForward!(y, w, x, d::cudnnRNNDescriptor; o...)Apply the RNN specified with weights w and configuration given by d or keyword options to input x.
Keyword arguments for hidden input/output:
hx=nothing: initialize the hidden vector if specified (by default initialized to 0).cx=nothing: initialize the cell vector (only in LSTMs) if specified (by default initialized to 0).hy=nothing: return the final hidden vector inhyif set toRef{Any}().cy=nothing: return the final cell vector incy(only in LSTMs) if set toRef{Any}().
Keyword arguments specifying the RNN when d::cudnnRNNDescriptor is not given:
hiddenSize::Integer: hidden vector size, which must be supplied whendis not givenalgo::cudnnRNNAlgo_t = CUDNN_RNN_ALGO_STANDARD: RNN algo (CUDNN_RNN_ALGO_STANDARD,CUDNN_RNN_ALGO_PERSIST_STATIC, orCUDNN_RNN_ALGO_PERSIST_DYNAMIC).cellMode::cudnnRNNMode_t = CUDNN_LSTM: Specifies the RNN cell type in the entire model (CUDNN_RNN_RELU,CUDNN_RNN_TANH,CUDNN_LSTM,CUDNN_GRU).biasMode::cudnnRNNBiasMode_t = CUDNN_RNN_DOUBLE_BIAS: Sets the number of bias vectors (CUDNN_RNN_NO_BIAS,CUDNN_RNN_SINGLE_INP_BIAS,CUDNN_RNN_SINGLE_REC_BIAS,CUDNN_RNN_DOUBLE_BIAS). The two single bias settings are functionally the same forRELU,TANHandLSTMcell types. For differences inGRUcells, see the description ofCUDNN_GRUin cudnn docs.dirMode::cudnnDirectionMode_t = CUDNN_UNIDIRECTIONAL: Specifies the recurrence pattern:CUDNN_UNIDIRECTIONALorCUDNN_BIDIRECTIONAL. In bidirectional RNNs, the hidden states passed between physical layers are concatenations of forward and backward hidden states.inputMode::cudnnRNNInputMode_t = CUDNN_LINEAR_INPUT: Specifies how the input to the RNN model is processed by the first layer. WheninputModeisCUDNN_LINEAR_INPUT, original input vectors of sizeinputSizeare multiplied by the weight matrix to obtain vectors ofhiddenSize. WheninputModeisCUDNN_SKIP_INPUT, the original input vectors to the first layer are used as is without multiplying them by the weight matrix.mathPrec::DataType = eltype(x): This parameter is used to control the compute math precision in the RNN model. ForFloat16input/output can beFloat16orFloat32, forFloat32orFloat64input/output, must match the input/output type.mathType::cudnnMathType_t = math_mode(): Sets the preferred option to use NVIDIA Tensor Cores accelerators on Volta (SM 7.0) or higher GPUs. WhendataTypeisCUDNN_DATA_HALF, themathTypeparameter can beCUDNN_DEFAULT_MATHorCUDNN_TENSOR_OP_MATH. TheALLOW_CONVERSIONsetting is treated the sameCUDNN_TENSOR_OP_MATHfor this data type. WhendataTypeisCUDNN_DATA_FLOAT, themathTypeparameter can beCUDNN_DEFAULT_MATHorCUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION. When the latter settings are used, original weights and intermediate results will be down-converted toCUDNN_DATA_HALFbefore they are used in another recursive iteration. When dataType isCUDNN_DATA_DOUBLE, themathTypeparameter can beCUDNN_DEFAULT_MATH.inputSize::Integer = size(x,1): Size of the input vector in the RNN model. When theinputMode=CUDNN_SKIP_INPUT, theinputSizeshould match thehiddenSizevalue.projSize::Integer = hiddenSize: The size of the LSTM cell output after the recurrent projection. This value should not be larger thanhiddenSize. It is legal to setprojSizeequal tohiddenSize, however, in this case, the recurrent projection feature is disabled. The recurrent projection is an additional matrix multiplication in the LSTM cell to project hidden state vectorshtinto smaller vectorsrt = Wr * ht, whereWris a rectangular matrix withprojSizerows and hiddenSize columns. When the recurrent projection is enabled, the output of the LSTM cell (both to the next layer and unrolled in-time) isrtinstead ofht. The recurrent projection can be enabled for LSTM cells andCUDNN_RNN_ALGO_STANDARDonly.numLayers::Integer = 1: Number of stacked, physical layers in the deep RNN model. WhendirMode= CUDNN_BIDIRECTIONAL, the physical layer consists of two pseudo-layers corresponding to forward and backward directions.dropout::Real = 0: When non-zero, dropout operation will be applied between physical layers. A single layer network will have no dropout applied. Dropout is used in the training mode only.auxFlags::Integer = CUDNN_RNN_PADDED_IO_ENABLED: Miscellaneous switches that do not require additional numerical values to configure the corresponding feature. In future cuDNN releases, this parameter will be used to extend the RNN functionality without adding new API functions (applicable options should be bitwise OR-ed). Currently, this parameter is used to enable or disable padded input/output (CUDNN_RNN_PADDED_IO_DISABLED,CUDNN_RNN_PADDED_IO_ENABLED). When the padded I/O is enabled, layoutsCUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKEDandCUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKEDare permitted in RNN data descriptors.
Other keyword arguments:
layout::cudnnRNNDataLayout_t = CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED: The memory layout of the RNN data tensor. Options areCUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED: Data layout is padded, with outer stride from one time-step to the next;CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_PACKED: The sequence length is sorted and packed as in the basic RNN API;CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED: Data layout is padded, with outer stride from one batch to the next.seqLengthArray::Vector{Cint} = nothing: An integer array withbatchSizenumber of elements. Describes the length (number of time-steps) of each sequence. Each element inseqLengthArraymust be greater than or equal to 0 but less than or equal tomaxSeqLength. In the packed layout, the elements should be sorted in descending order, similar to the layout required by the non-extended RNN compute functions. The default valuenothingassumes uniformseqLengths, no padding.devSeqLengths::CuVector{Cint} = nothing: Device copy ofseqLengthArrayfwdMode::cudnnForwardMode_t = CUDNN_FWD_MODE_INFERENCE: set toCUDNN_FWD_MODE_TRAININGwhen training
cuDNN.cudnnRNNForward! — FunctioncudnnRNNForward(w, x; hiddenSize, o...)
cudnnRNNForward!(y, w, x; hiddenSize, o...)
cudnnRNNForward(w, x, d::cudnnRNNDescriptor; o...)
cudnnRNNForward!(y, w, x, d::cudnnRNNDescriptor; o...)Apply the RNN specified with weights w and configuration given by d or keyword options to input x.
Keyword arguments for hidden input/output:
hx=nothing: initialize the hidden vector if specified (by default initialized to 0).cx=nothing: initialize the cell vector (only in LSTMs) if specified (by default initialized to 0).hy=nothing: return the final hidden vector inhyif set toRef{Any}().cy=nothing: return the final cell vector incy(only in LSTMs) if set toRef{Any}().
Keyword arguments specifying the RNN when d::cudnnRNNDescriptor is not given:
hiddenSize::Integer: hidden vector size, which must be supplied whendis not givenalgo::cudnnRNNAlgo_t = CUDNN_RNN_ALGO_STANDARD: RNN algo (CUDNN_RNN_ALGO_STANDARD,CUDNN_RNN_ALGO_PERSIST_STATIC, orCUDNN_RNN_ALGO_PERSIST_DYNAMIC).cellMode::cudnnRNNMode_t = CUDNN_LSTM: Specifies the RNN cell type in the entire model (CUDNN_RNN_RELU,CUDNN_RNN_TANH,CUDNN_LSTM,CUDNN_GRU).biasMode::cudnnRNNBiasMode_t = CUDNN_RNN_DOUBLE_BIAS: Sets the number of bias vectors (CUDNN_RNN_NO_BIAS,CUDNN_RNN_SINGLE_INP_BIAS,CUDNN_RNN_SINGLE_REC_BIAS,CUDNN_RNN_DOUBLE_BIAS). The two single bias settings are functionally the same forRELU,TANHandLSTMcell types. For differences inGRUcells, see the description ofCUDNN_GRUin cudnn docs.dirMode::cudnnDirectionMode_t = CUDNN_UNIDIRECTIONAL: Specifies the recurrence pattern:CUDNN_UNIDIRECTIONALorCUDNN_BIDIRECTIONAL. In bidirectional RNNs, the hidden states passed between physical layers are concatenations of forward and backward hidden states.inputMode::cudnnRNNInputMode_t = CUDNN_LINEAR_INPUT: Specifies how the input to the RNN model is processed by the first layer. WheninputModeisCUDNN_LINEAR_INPUT, original input vectors of sizeinputSizeare multiplied by the weight matrix to obtain vectors ofhiddenSize. WheninputModeisCUDNN_SKIP_INPUT, the original input vectors to the first layer are used as is without multiplying them by the weight matrix.mathPrec::DataType = eltype(x): This parameter is used to control the compute math precision in the RNN model. ForFloat16input/output can beFloat16orFloat32, forFloat32orFloat64input/output, must match the input/output type.mathType::cudnnMathType_t = math_mode(): Sets the preferred option to use NVIDIA Tensor Cores accelerators on Volta (SM 7.0) or higher GPUs. WhendataTypeisCUDNN_DATA_HALF, themathTypeparameter can beCUDNN_DEFAULT_MATHorCUDNN_TENSOR_OP_MATH. TheALLOW_CONVERSIONsetting is treated the sameCUDNN_TENSOR_OP_MATHfor this data type. WhendataTypeisCUDNN_DATA_FLOAT, themathTypeparameter can beCUDNN_DEFAULT_MATHorCUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION. When the latter settings are used, original weights and intermediate results will be down-converted toCUDNN_DATA_HALFbefore they are used in another recursive iteration. When dataType isCUDNN_DATA_DOUBLE, themathTypeparameter can beCUDNN_DEFAULT_MATH.inputSize::Integer = size(x,1): Size of the input vector in the RNN model. When theinputMode=CUDNN_SKIP_INPUT, theinputSizeshould match thehiddenSizevalue.projSize::Integer = hiddenSize: The size of the LSTM cell output after the recurrent projection. This value should not be larger thanhiddenSize. It is legal to setprojSizeequal tohiddenSize, however, in this case, the recurrent projection feature is disabled. The recurrent projection is an additional matrix multiplication in the LSTM cell to project hidden state vectorshtinto smaller vectorsrt = Wr * ht, whereWris a rectangular matrix withprojSizerows and hiddenSize columns. When the recurrent projection is enabled, the output of the LSTM cell (both to the next layer and unrolled in-time) isrtinstead ofht. The recurrent projection can be enabled for LSTM cells andCUDNN_RNN_ALGO_STANDARDonly.numLayers::Integer = 1: Number of stacked, physical layers in the deep RNN model. WhendirMode= CUDNN_BIDIRECTIONAL, the physical layer consists of two pseudo-layers corresponding to forward and backward directions.dropout::Real = 0: When non-zero, dropout operation will be applied between physical layers. A single layer network will have no dropout applied. Dropout is used in the training mode only.auxFlags::Integer = CUDNN_RNN_PADDED_IO_ENABLED: Miscellaneous switches that do not require additional numerical values to configure the corresponding feature. In future cuDNN releases, this parameter will be used to extend the RNN functionality without adding new API functions (applicable options should be bitwise OR-ed). Currently, this parameter is used to enable or disable padded input/output (CUDNN_RNN_PADDED_IO_DISABLED,CUDNN_RNN_PADDED_IO_ENABLED). When the padded I/O is enabled, layoutsCUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKEDandCUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKEDare permitted in RNN data descriptors.
Other keyword arguments:
layout::cudnnRNNDataLayout_t = CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED: The memory layout of the RNN data tensor. Options areCUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_UNPACKED: Data layout is padded, with outer stride from one time-step to the next;CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_PACKED: The sequence length is sorted and packed as in the basic RNN API;CUDNN_RNN_DATA_LAYOUT_BATCH_MAJOR_UNPACKED: Data layout is padded, with outer stride from one batch to the next.seqLengthArray::Vector{Cint} = nothing: An integer array withbatchSizenumber of elements. Describes the length (number of time-steps) of each sequence. Each element inseqLengthArraymust be greater than or equal to 0 but less than or equal tomaxSeqLength. In the packed layout, the elements should be sorted in descending order, similar to the layout required by the non-extended RNN compute functions. The default valuenothingassumes uniformseqLengths, no padding.devSeqLengths::CuVector{Cint} = nothing: Device copy ofseqLengthArrayfwdMode::cudnnForwardMode_t = CUDNN_FWD_MODE_INFERENCE: set toCUDNN_FWD_MODE_TRAININGwhen training
cuDNN.cudnnReduceTensor — FunctioncudnnReduceTensor(x; dims, op, compType, nanOpt, indices, alpha)
cudnnReduceTensor(x, d::cudnnReduceTensorDescriptor; dims, indices, alpha)
cudnnReduceTensor!(y, x; op, compType, nanOpt, indices, alpha, beta)
cudnnReduceTensor!(y, x, d::cudnnReduceTensorDescriptor; indices, alpha, beta)Return the result of the specified reduction operation applied to x. Optionally y holds the result and d specifies the operation. Each dimension of the output tensor y must match the corresponding dimension of the input tensor x or must be equal to 1. The dimensions equal to 1 indicate the dimensions of x to be reduced. Keyword arguments:
dims = ntuple(i->1,ndims(x)): specifies the shape of the output whenyis not givenindices = nothing: previously allocated space for writing indices which can be generated for min and max ops only, can be aCuArrayofUInt8,UInt16,UInt32orUInt64alpha=1, beta=0are used for scaling, i.e.y .= alpha * op.(x1) .+ beta * y
Keyword arguments that can be used when reduceTensorDesc is not specified:
op = CUDNN_REDUCE_TENSOR_ADD: Reduction operation,ADDcan be replaced withMUL,MIN,MAX,AMAX,AVG,NORM1,NORM2,MUL_NO_ZEROScompType = (eltype(x) <: Float64 ? Float64 : Float32): Computation datatypenanOpt = CUDNN_NOT_PROPAGATE_NAN: NaN propagation policy, the other option isCUDNN_PROPAGATE_NAN
cuDNN.cudnnReduceTensor! — FunctioncudnnReduceTensor(x; dims, op, compType, nanOpt, indices, alpha)
cudnnReduceTensor(x, d::cudnnReduceTensorDescriptor; dims, indices, alpha)
cudnnReduceTensor!(y, x; op, compType, nanOpt, indices, alpha, beta)
cudnnReduceTensor!(y, x, d::cudnnReduceTensorDescriptor; indices, alpha, beta)Return the result of the specified reduction operation applied to x. Optionally y holds the result and d specifies the operation. Each dimension of the output tensor y must match the corresponding dimension of the input tensor x or must be equal to 1. The dimensions equal to 1 indicate the dimensions of x to be reduced. Keyword arguments:
dims = ntuple(i->1,ndims(x)): specifies the shape of the output whenyis not givenindices = nothing: previously allocated space for writing indices which can be generated for min and max ops only, can be aCuArrayofUInt8,UInt16,UInt32orUInt64alpha=1, beta=0are used for scaling, i.e.y .= alpha * op.(x1) .+ beta * y
Keyword arguments that can be used when reduceTensorDesc is not specified:
op = CUDNN_REDUCE_TENSOR_ADD: Reduction operation,ADDcan be replaced withMUL,MIN,MAX,AMAX,AVG,NORM1,NORM2,MUL_NO_ZEROScompType = (eltype(x) <: Float64 ? Float64 : Float32): Computation datatypenanOpt = CUDNN_NOT_PROPAGATE_NAN: NaN propagation policy, the other option isCUDNN_PROPAGATE_NAN
cuDNN.cudnnScaleTensor — FunctioncudnnScaleTensor(x, s)
cudnnScaleTensor!(y, x, s)Scale all elements of tensor x with scale s and return the result. cudnnScaleTensor allocates a new array for the answer, cudnnScaleTensor! overwrites y.
cuDNN.cudnnScaleTensor! — FunctioncudnnScaleTensor(x, s)
cudnnScaleTensor!(y, x, s)Scale all elements of tensor x with scale s and return the result. cudnnScaleTensor allocates a new array for the answer, cudnnScaleTensor! overwrites y.
cuDNN.cudnnSetTensor! — MethodcudnnSetTensor!(x, s)Set all elements of tensor x to scalar s and return x.
cuDNN.cudnnSoftmaxForward — FunctioncudnnSoftmaxForward(x; algo, mode, alpha)
cudnnSoftmaxForward!(y, x; algo, mode, alpha, beta)Return the softmax or logsoftmax of the input x depending on the algo keyword argument. The y argument holds the result and it should be similar to x if specified. Keyword arguments:
algo = (CUDA.math_mode()===CUDA.FAST_MATH ? CUDNN_SOFTMAX_FAST : CUDNN_SOFTMAX_ACCURATE): Options areCUDNN_SOFTMAX_ACCURATEwhich subtracts max from every point to avoid overflow,CUDNN_SOFTMAX_FASTwhich doesn't andCUDNN_SOFTMAX_LOGwhich returns logsoftmax.mode = CUDNN_SOFTMAX_MODE_INSTANCE: Compute softmax per image (N) across the dimensions C,H,W.CUDNN_SOFTMAX_MODE_CHANNELcomputes softmax per spatial location (H,W) per image (N) across the dimension C.alpha=1, beta=0can be used for scaling, i.e.y .= alpha * op(x1) .+ beta * y
cuDNN.cudnnSoftmaxForward! — FunctioncudnnSoftmaxForward(x; algo, mode, alpha)
cudnnSoftmaxForward!(y, x; algo, mode, alpha, beta)Return the softmax or logsoftmax of the input x depending on the algo keyword argument. The y argument holds the result and it should be similar to x if specified. Keyword arguments:
algo = (CUDA.math_mode()===CUDA.FAST_MATH ? CUDNN_SOFTMAX_FAST : CUDNN_SOFTMAX_ACCURATE): Options areCUDNN_SOFTMAX_ACCURATEwhich subtracts max from every point to avoid overflow,CUDNN_SOFTMAX_FASTwhich doesn't andCUDNN_SOFTMAX_LOGwhich returns logsoftmax.mode = CUDNN_SOFTMAX_MODE_INSTANCE: Compute softmax per image (N) across the dimensions C,H,W.CUDNN_SOFTMAX_MODE_CHANNELcomputes softmax per spatial location (H,W) per image (N) across the dimension C.alpha=1, beta=0can be used for scaling, i.e.y .= alpha * op(x1) .+ beta * y
cuDNN.sdim — Methodsdim(x,axes,dim)
sdim(x,axes)The first form returns the size of x in the dimension specified with dim::cudnnSeqDataAxis_t (e.g. CUDNNSEQDATATIME_DIM), i.e. return size(x,i) such that axes[i]==dim.
The second form returns an array of length 4 dims::Vector{Cint} such that dims[1+dim] == sdim(x,axes,dim) where dim::cudnnSeqDataAxis_t specifies the role of the dimension (e.g. dims[CUDNNSEQDATATIME_DIM]==5).
The axes::Vector{cudnnSeqDataAxis_t} argument is an array of length 4 that specifies the role of Julia dimensions, e.g. axes[3]=CUDNN_SEQDATA_TIME_DIM.
cuDNN.@cudnnDescriptor — Macro@cudnnDescriptor(XXX, setter=cudnnSetXXXDescriptor)Defines a new type cudnnXXXDescriptor with a single field ptr::cudnnXXXDescriptor_t and its constructor. The second optional argument is the function that sets the descriptor fields and defaults to cudnnSetXXXDescriptor. The constructor is memoized, i.e. when called with the same arguments it returns the same object rather than creating a new one.
The arguments of the constructor and thus the keys to the memoization cache depend on the setter: If the setter has arguments cudnnSetXXXDescriptor(ptr::cudnnXXXDescriptor_t, args...), then the constructor has cudnnXXXDescriptor(args...). The user can control these arguments by defining a custom setter.