CrossEntropyLossLayer
✖
CrossEntropyLossLayer
represents a net layer that computes the cross-entropy loss by comparing input class probability vectors with indices representing the target class.
represents a net layer that computes the cross-entropy loss by comparing input class probability vectors with target class probability vectors.
represents a net layer that computes the binary cross-entropy loss by comparing input probability scalars with target probability scalars, where each probability represents a binary choice.
Details and Options



- CrossEntropyLossLayer exposes the following ports for use in NetGraph etc.:
-
"Input" real array of rank n "Target" real array of rank n or integer array of rank n-1 "Loss" real number - When operating on multidimensional inputs, CrossEntropyLossLayer effectively threads over any extra array dimensions to produce an array of losses and returns the mean of these losses.
- For CrossEntropyLossLayer["Binary"], the input and target should be scalar values between 0 and 1, or arrays of these.
- For CrossEntropyLossLayer["Index"], the input should be a vector of probabilities {p1,…,pc} that sums to 1, or an array of such vectors. The target should be an integer between 1 and c, or an array of such integers.
- For CrossEntropyLossLayer["Probabilities"], the input and target should be a vector of probabilities that sums to 1, or an array of such vectors.
- For the "Index" and "Probabilities" forms, where the input array has dimensions {d1,d2,…,dn}, the final dimension dn is used to index the class. The output loss is taken to be the mean over the remaining dimensions {d1,…,dn-1}.
- CrossEntropyLossLayer[…][<"Input"->in,"Target"target >] explicitly computes the output from applying the layer.
- CrossEntropyLossLayer[…][<"Input"->{in1,in2,…},"Target"->{target1,target2,…} >] explicitly computes outputs for each of the ini and targeti.
- When given a NumericArray as input, the output will be a NumericArray.
- CrossEntropyLossLayer is typically used inside NetGraph to construct a training network.
- CrossEntropyLossLayer can operate on arrays that contain "Varying" dimensions.
- A CrossEntropyLossLayer[…] can be provided as the third argument to NetTrain when training a specific network.
- When appropriate, CrossEntropyLossLayer is automatically used by NetTrain if an explicit loss specification is not provided. One of "Binary", "Probabilities", or "Index" will be chosen based on the final activation used for the output port and the form of any attached NetDecoder.
- CrossEntropyLossLayer[form,"port"->shape] allows the shape of the input or target port to be specified. Possible forms for shape are:
-
"Real" a single real number "Integer" a single integer n a vector of length n {n1,n2,…} an array of dimensions n1×n2×… "Varying" a vector whose length is variable {"Varying",n2,n3,…} an array whose first dimension is variable and remaining dimensions are n2×n3×… NetEncoder[…] an encoder NetEncoder[{…,"Dimensions"{n1,…}}] an encoder mapped over an array of dimensions n1×… - Options[CrossEntropyLossLayer] gives the list of default options to construct the layer. Options[CrossEntropyLossLayer[…]] gives the list of default options to evaluate the layer on some data.
- Information[CrossEntropyLossLayer[…]] gives a report about the net layer.
- Information[CrossEntropyLossLayer[…],prop] gives the value of the property prop of CrossEntropyLossLayer[…]. Possible properties are the same as for NetGraph.
Examples
open allclose allBasic Examples (3)Summary of the most common use cases
Create a CrossEntropyLossLayer object that takes a probability vector and an index:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-9reqx3

Create a CrossEntropyLossLayer where the input is a probability vector and the target is an index:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-plcxfy

Apply it to an input and a target:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-3tu0cv

Create a CrossEntropyLossLayer that operates on vectors generated from strings:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-otb5aa

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-fru6tz

Apply it to an input and a target:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-baujnb

Completely correct predictions produce a loss of 0:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-yo5y67

Scope (5)Survey of the scope of standard use cases
Create a CrossEntropyLossLayer where the input and target are single probabilities:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-opf9qp

Apply it to an input and a target:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-vc283n

Thread the layer over a batch of inputs:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-3lymae

Create a CrossEntropyLossLayer where the input is a probability vector and the target is an index:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-v0y3sc

Apply it to an input and a target:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-r45wuc

Create a CrossEntropyLossLayer where the input is a probability vector and the target is a probability vector:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-mpvmz1

Apply it to an input and a target:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-x8ex4u

Create a CrossEntropyLossLayer where the input and target are images representing matrices of binary class probabilities:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-4homlo

Apply the layer to an input and target:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-o3j490

Measure all possible losses from inputs and targets from a small set:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-ii4tvx

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-36jff8
The input-target pairs with the smallest losses:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-dzytq1

The input-target pairs with the largest losses:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-ml7xl1

Create a graph containing a CrossEntropyLossLayer in which the input is a 3-channel image in which each color channel represents a class, and the target is a matrix of indices representing the correct class for each pixel:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-xvpsq1

Measure the loss on a target image and matrix in which the areas that are predominantly red, green, and blue match the indices 1, 2, and 3, respectively, in the target:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-pxcg9o

Permuting the colors makes the per-pixel distributions disagree with the target matrix and increases the loss:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-ip7mc7

Applications (2)Sample problems that can be solved with this function
CrossEntropyLossLayer["Binary"] is used automatically by NetTrain when the final activation used for an output is an ElementwiseLayer[LogisticSigmoid]. Create a network that takes a pair of numbers and produces either True or False:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-rrr7ry

Train the network to decide if the first number in the pair is greater than the second:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-uda82a

The training network automatically constructed by NetTrain contains a binary-type CrossEntropyLossLayer:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-fhi8fu

Show the behavior of the trained network on the plane:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-67yzuz


https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-ralj0e

Plot the underlying probability learned by the network:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-d4uh60

CrossEntropyLossLayer["Index"] is used automatically by NetTrain when the final activation used for an output is a SoftmaxLayer. Create an artificial dataset from three normally distributed clusters:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-hhp24w

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-qddujl

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-fcsxx1

The training data consists of rules mapping the point to the cluster it is in:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-u4wtyb

Create a net to compute the probability of a point lying in each cluster, using a "Class" decoder to classify the input as either Red, Green or Blue:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-6r2iad


https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-0vjiho

The training network automatically constructed by NetTrain contains an index-type CrossEntropyLossLayer:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-r0b52c

Evaluate the net on the centers of each cluster:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-ni3s3s


https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-tdbu2l

Show the contours in feature space in which each class reaches a posterior probability of 0.75:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-1d95lt

Properties & Relations (4)Properties of the function, and connections to other functions
Here is the function computed by CrossEntropyLossLayer["Binary"]:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-5hya20
Evaluate the function on some data:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-2uzk67

This is equivalent to the following:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-kzkagb

When the target is fixed at 1, the loss is minimized as the input approaches 1:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-egkimd

In general, the loss is minimized when the target approaches the input:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-zscqcm

CrossEntropyLossLayer["Binary"] applied to scalar probabilities p is equivalent to CrossEntropyLossLayer["Probabilities"] applied to a vector probabilities {p,1-p}:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-fo9d5z


https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-xoelhc

Demonstrate the same by substitution:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-nwc6dn

Here is the function computed by CrossEntropyLossLayer["Probabilities"]:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-87kvat
Evaluate the function on some data:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-ij6swv

This is equivalent to the following:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-c4q3hq

CrossEntropyLossLayer["Probabilities"] is equivalent to CrossEntropyLossLayer["Index"] with a one-hot encoding:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-5f2lur
Evaluating them on sparse label data gives the same result:

https://wolfram.com/xid/0nq8b9wrw92vp5x8aa7aduq-ex8lb8

CrossEntropyLossLayer["Index"] is typically faster to evaluate than CrossEntropyLossLayer["Probabilities"] and can use significantly less memory when the number of classes is very large.
Wolfram Research (2016), CrossEntropyLossLayer, Wolfram Language function, https://reference.wolfram.com/language/ref/CrossEntropyLossLayer.html (updated 2020).
Text
Wolfram Research (2016), CrossEntropyLossLayer, Wolfram Language function, https://reference.wolfram.com/language/ref/CrossEntropyLossLayer.html (updated 2020).
Wolfram Research (2016), CrossEntropyLossLayer, Wolfram Language function, https://reference.wolfram.com/language/ref/CrossEntropyLossLayer.html (updated 2020).
CMS
Wolfram Language. 2016. "CrossEntropyLossLayer." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/CrossEntropyLossLayer.html.
Wolfram Language. 2016. "CrossEntropyLossLayer." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/CrossEntropyLossLayer.html.
APA
Wolfram Language. (2016). CrossEntropyLossLayer. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CrossEntropyLossLayer.html
Wolfram Language. (2016). CrossEntropyLossLayer. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CrossEntropyLossLayer.html
BibTeX
@misc{reference.wolfram_2025_crossentropylosslayer, author="Wolfram Research", title="{CrossEntropyLossLayer}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/CrossEntropyLossLayer.html}", note=[Accessed: 29-March-2025
]}
BibLaTeX
@online{reference.wolfram_2025_crossentropylosslayer, organization={Wolfram Research}, title={CrossEntropyLossLayer}, year={2020}, url={https://reference.wolfram.com/language/ref/CrossEntropyLossLayer.html}, note=[Accessed: 29-March-2025
]}