Audio Processing—Wolfram Documentation

Wolfram Language & System Documentation Center

Audio Processing

Filtering	Analysis
Effects and Manipulation

The Wolfram Language provides built-in support for both programmatic and interactive audio processing, fully integrated with other powerful mathematical and algorithmic capabilities. You can process audio objects by applying linear and nonlinear filters, add effects, and analyze them using audio-specific functions or by exploiting the extensive integration with the rest of the Wolfram Language.

Filtering

Audio signals can be used as input to many signal processing functions.

LowpassFilter[audio,ω_c]	apply a lowpass filter with a cutoff frequency ω_c to audio
HighpassFilter[audio,ω_c]	apply a highpass filter with a cutoff frequency ω_c to audio
WienerFilter[audio,r]	apply Wiener filter with a range of r samples to audio
MeanFilter[audio,r]	apply mean filter with a range of r samples to audio
TotalVariationFilter[audio]	apply total variation filter to audio
GaussianFilter[audio, r]	apply Gaussian filter with a range of r samples to audio

Some of the filters that can be directly applied to audio objects.

Many of the filtering functions present in the Wolfram Language can be immediately used on audio objects. In many cases, it is possible to specify the cutoff frequency as a frequency Quantity.

FIR filters applied to an audio object:

Wolfram Language code: a = ExampleData[{"Audio", "Apollo11ReturnSafely"}, "Audio"]

Wolfram Language code: lp = LowpassFilter[a, Quantity[800, "Hertz"], 101]

Wolfram Language code: hp = HighpassFilter[a, Quantity[1600, "Hertz"], 101]

Wolfram Language code: bs = BandstopFilter[a, {Quantity[800, "Hertz"], Quantity[1600, "Hertz"]}, 91]

Plot the periodogram of the original and processed signals:

Wolfram Language code:

Periodogram[{a, bs, lp, hp}, 2000, PlotRange -> {{0, 3000}, Automatic}, Frame -> True, ImageSize -> Medium, PlotLegends -> {"Original", "Bandstop", "Lowpass", "Highpass"}]

Use WienerFilter to denoise a recording:

Wolfram Language code: a = ExampleData[{"Audio", "Apollo11ReturnSafely"}, "Audio"]

Wolfram Language code: WienerFilter[a, 25]

Wolfram Language code: Spectrogram[#, ImageSize -> Small, FrameTicks -> None]& /@ {a, %}

Discrete-time transfer function models can be used to filter an audio object using RecurrenceFilter.

RecurrenceFilter[tf,audio]	uses a discrete-time filter defined by the TransferFunctionModel tf
BiquadraticFilterModel[{"type",spec}]	creates a biquadratic filter of a given {"type",spec}
ButterworthFilterModel[{"type",spec}]	creates a Butterworth filter of a given {"type",spec}
TransferFunctionModel[m,s]	represents the model of the transfer-function matrix m with complex variable s
ToDiscreteTimeModel[lsys,τ]	gives the discrete-time approximation, with sampling period τ , of the continuous-time systems models lsys

Some of the available filter models and additional utilities.

The simplest way to use one of the analog (continuous-time) filter models is to discretize the transfer function using ToDiscreteTimeModel and apply the result to an audio object using RecurrenceFilter.

Apply a resonant biquadratic filter to an audio object:

Wolfram Language code: filterModel = BiquadraticFilterModel[{ω, q}]

Wolfram Language code:

discreteFilter = ToDiscreteTimeModel[filterModel, 1 / 22050, z, Method -> {"BilinearTransform", "CriticalFrequency" -> ω}]

Wolfram Language code: a = \!\(\*AudioBox["![Embedded Audio Player](audio://content-1323j)"]\);

Wolfram Language code: RecurrenceFilter[discreteFilter /. {ω -> 1000 2 Pi, q -> 7}, a]

A discrete transfer function can be created using TransferFunctionModel and applied to an audio object with RecurrenceFilter.

Define a comb filter using TransferFunctionModel and apply it to an audio object:

Wolfram Language code:

combFilterModel[f_, α_, sr : _ : 22050] := TransferFunctionModel[{{1 / ( 1 - α z^-Round[sr / f])}}, z, SamplingPeriod -> (1/sr)]

Wolfram Language code: combFilterModel[150, α]

Wolfram Language code: RecurrenceFilter[combFilterModel[150, -.9], a]

Effects and Manipulation

An audio object can be modified and manipulated using built-in or user-defined functions.

AudioTimeStretch[audio,r]	apply time stretching by the specified factor r to audio
AudioPitchShift[audio,r]	apply pitch shifting by the specified factor r to audio
AudioReverb[audio]	apply a reverberation effect to audio
AudioDelay[audio,delay]	apply a delay effect with delay time delay to audio
AudioChannelMix[audio,desttype]	mix the channel of audio to the specified desttype

Some common audio effects.

Use pitch shifting and time stretching to independently modify pitch and duration of an audio signal.

Slow down an audio object:

Wolfram Language code: a = ExampleData[{"Audio", "MaleVoice"}, "Audio"]

Wolfram Language code: AudioTimeStretch[a, 1.5]

Shift the pitch of an audio object up:

Wolfram Language code: AudioPitchShift[a, Quantity[2, IndependentUnit["semitones"]]]

Delay and reverberation effects can be used to immerse a recording in a virtual environment or to produce special effects.

Apply a delay or reverb effect:

Wolfram Language code: AudioDelay[a, .7, .8, .4, PaddingSize -> 2]

Wolfram Language code: AudioTrim[AudioReverb[a, ExampleData[{"Audio", "IRStMarysChurch"}]], 10]

Perform Karplus–Strong synthesis by adding a short delay with a high feedback value to a burst of noise. This will simulate the sound of a vibrating string:

Wolfram Language code:

freq = 60;
feedback = 0.99;
AudioDelay[AudioGenerator["Pink", .01], 1 / freq, feedback, 1, PaddingSize -> 5, Method -> {"LowpassCutoff" -> Quantity[8000, "Hertz"]}]

Downmixing and upmixing to an arbitrary number of channels can be achieved using AudioChannelMix.

Downmix and upmix a multichannel audio object:

Wolfram Language code: a//AudioChannels

Wolfram Language code: AudioChannelMix[a, "Mono"]//AudioChannels

Wolfram Language code: AudioChannelMix[a, 3]//AudioChannels

It is possible to alter a recording by taking advantage of performing arithmetic operations on the Audio object. All Wolfram Language operators and functions with attributes NumericFunction or Listable are overloaded to work with audio objects.

Apply a smooth distortion to an audio object using the Tanh function:

Wolfram Language code: a = AudioNormalize[Import["ExampleData/Rule30.wav"]]

Wolfram Language code: Tanh[5a]

Use the ChebyshevT function to obtain a "waveshaper" effect:

Wolfram Language code: ChebyshevT[10, a]

Multiply an audio object with a sine wave to get a "ring modulator" effect:

Wolfram Language code: a×\!\(\*AudioBox["![Embedded Audio Player](audio://content-71fn6)"]\)

Analysis

Analysis of the Whole Signal

Properties can be computed on a recording both on a global and a local scale.

AudioMeasurements[audio,"prop"]

compute the property "prop" for the entire audio

Some of the functionality to compute global measurements.

Both time domain and frequency domain properties can be measured with AudioMeasurements. The properties are computed on the average sample values over the channels of the audio object.

Compute time domain properties of a recording:

Wolfram Language code:

AudioMeasurements[ExampleData[{"Audio", "Drums"}], {"MinMax", "Mean", "StandardDeviation", "RMSAmplitude", "ZeroCrossings"}, "Dataset"]

Compute frequency domain properties of a recording:

Wolfram Language code:

AudioMeasurements[ExampleData[{"Audio", "Drums"}], {"SpectralCentroid", "SpectralFlatness", "SpectralRollOff", "SpectralSpread"}, "Dataset"]

Compute other measurements by directly applying built-in statistical functions.

Compute moment and entropy of an audio object:

Wolfram Language code: a = AudioGenerator["Sin"];

Wolfram Language code: Moment[a, 2]

Wolfram Language code: Entropy[a]

Unlike AudioMeasurements, overloaded functions are applied to the flattened version of the data. If the input is a multichannel audio object, the sample values from all channels will be flattened in a single array.

Compute statistical properties of a recording:

Wolfram Language code:

AssociationThread[{"MinMax", "Mean", "StandardDeviation", "RMSAmplitude", "ZeroCrossings"}  -> 
	Through[{MinMax, Mean, StandardDeviation, Sqrt@Mean[# ^ 2]&, Total@CrossingDetect[#, CornerNeighbors -> None]&}@ExampleData[{"Audio", "Drums"}]]
	]//Dataset

Analysis of the Partitioned Signal

In addition to global properties of audio objects, it is also possible to compute measurements locally.

AudioLocalMeasurements[audio,"prop"]	compute the property "prop" locally for partitions of audio
AudioIntervals[audio,crit]	find the intervals of audio for which the criterion crit is satisfied

Some of the functionality to compute local measurements..

In AudioLocalMeasurements, properties are computed locally. The signal is partitioned according to the PartitionGranularity specification, and the requested property is computed on each partition. The result is returned as a TimeSeries whose timestamps correspond to the center of each partition.

Compute the RMS amplitude on 40 ms partitions with an offset of 1 ms:

Wolfram Language code:

a = \!\(\*AudioBox["![Embedded Audio Player](audio://content-v3fc5)"]\);res = AudioLocalMeasurements[a, "RMSAmplitude", PartitionGranularity -> {Quantity[40, "Milliseconds"], Quantity[1, "Milliseconds"]}];
Show[AudioPlot[a], ListLinePlot[res, PlotStyle -> Red]]

The result of AudioLocalMeasurements or AudioMeasurements can be used as an input for other functionality in the Wolfram Language.

Use the "SpectralCentroid" and "SpectralSpread" measurements to find clusters of similar audio objects in a list:

Wolfram Language code:

samples = {\!\(\*AudioBox["![Embedded Audio Player](audio://content-ewxls)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-gcjdl)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-fytkk)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-dkdtd)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-pqo5r)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-1iwl9)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-7qywt)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-nldr6)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-xc09e)"]\)};
props = AudioMeasurements[#, {"SpectralCentroid", "SpectralSpread"}, "List"]& /@ samples;
clusters = FindClusters[props -> props]

Wolfram Language code:

ListPlot[Partition[props, 1], PlotMarkers -> samples, ImageSize -> 300, Prolog -> ({Opacity[.3, RGBColor[1, 0.5, 0.5]], Disk[Mean@#, 400]& /@ clusters}), PlotRange -> {{00, 4000}, {0, 4000}}, AspectRatio -> 1]

Use the MFCC measurement as a feature to compute the distance between various elements of the ExampleData["Audio"] collection:

Wolfram Language code:

list = Select[ExampleData["Audio"], ExampleData[#, "Duration"] < 10&];
a = ConformAudio[AudioNormalize@AudioChannelMix[#, 1]& /@ ExampleData[#, "Audio"]& /@ list, SampleRate -> 11025];

Wolfram Language code: mfcc = AudioLocalMeasurements[#, "MFCC", PartitionGranularity -> {.05, .01}]["Values"]& /@ a;

Wolfram Language code:

ticks = Thread[{Range[Length@list], Text /@ list[[All, 2]]}];MatrixPlot[DistanceMatrix[mfcc], ImageSize -> Medium, FrameTicks -> {ticks, Apply[Rotate[#, Pi / 2]&, ticks, {2}]}]

Using AudioIntervals allows you to extract intervals on which a user-defined criterion is satisfied.

Locate the non-voiced sections of a recording:

Wolfram Language code:

a = ExampleData[{"Audio", "NoisyTalk"}, "Audio"];
nonVoicedIntervals = AudioIntervals[a, #RMSAmplitude < .02 && #SpectralFlatness > .0001&, .1, PartitionGranularity -> {.06, .01}]

Wolfram Language code:

AudioPlot[a, Epilog -> {RGBColor[1, 0, 0, .3], Rectangle[{#[[1]], -1}, {#[[2]], 1}]& /@ nonVoicedIntervals}, ImageSize -> Medium]

High-Level Analysis

It is possible to use neural networks–based functions to gain a deeper insight into the contents of a signal.

SpeechRecognize[audio]	recognize the speech in audio and return it as a string
PitchRecognize[audio]	recognize the main pitch in audio
AudioIdentify[audio]	identify what audio is the recording of

Some of the neural networks–based functionality.

Perform high-level audio analysis:

Wolfram Language code: a = SpeechSynthesize["hello i am a computer"]

Wolfram Language code: SpeechRecognize[a]

Wolfram Language code: PitchRecognize[a]//ListLinePlot

Wolfram Language code: AudioIdentify[a]

All the machine learning functions are aware of Audio objects and perform their computations starting with a semantically significant feature extraction.

Classify[{audio₁class₁,audio₂class₂,…}]	generate a ClassifierFunction[…] trained on the examples and classes given
FeatureExtraction[{audio₁,audio₂,…}]	generate a FeatureExtractorFunction[…] trained on the examples given
FeatureSpacePlot[{audio₁,audio₂,…}]	plot the features extracted from audio_i as a scatter plot

Some of the high-level machine learning functions.

This preprocessing transforms each audio object in a fixed-size vector, so that they can be easily compared.

Classify a small dataset of instrument sounds:

Wolfram Language code:

data = Join@@Table[Thread[WebAudioSearch[instr, "Samples", Sequence[Slot["Duration"] < 5&, MaxItems -> 40]] -> instr], {instr, {"acoustic guitar", "drums", "trumpet"}}];
{train, test} = TakeDrop[RandomSample[data], 90];

Wolfram Language code: cl = Classify[train]

Wolfram Language code: ClassifierMeasurements[cl, test, "Report"]

Plot the features of a audio collection using FeatureSpacePlot.

Plot a list of signals in a semantically meaningful space with FeatureSpacePlot:

Wolfram Language code: FeatureSpacePlot[ExampleData[#] -> #[[2]]& /@ ExampleData["Audio"], LabelingFunction -> Callout]

Neural Networks

The Audio object is tightly integrated in the powerful neural networks framework. NetEncoder provides an easy entry point into neural nets for various high-level constructs such as the Audio object.

"Audio"	encode a signal as a waveform
"AudioSpectrogram"	encode a signal as a spectrogram
"AudioMFCC"	encode a signal as a mel spectrogram

Some examples of audio NetEncoder.

Different encoders can be used to compute various kinds of features. Some maintain all of the information of the original signal (like "Audio" and "AudioSTFT"), while others provide a compromise between discarding some information but dramatically reducing the dimensionality (like "AudioMFCC").

Visualize features computed using different encoders:

Wolfram Language code: a = ExampleData[{"Audio", "Bird"}, "Audio"]

Wolfram Language code: Labeled[ListLinePlot[First@Transpose@Normal[NetEncoder["Audio"][a]]], "Audio"]

Wolfram Language code:

Labeled[MatrixPlot[Log@Transpose@Normal[NetEncoder["AudioSpectrogram"][a]], DataReversed -> {True, False}], "AudioSpectrogram"]

Wolfram Language code: Labeled[MatrixPlot[Transpose@Normal[NetEncoder["AudioMFCC"][a]], DataReversed -> {True, False}], "AudioMFCC"]

Using the encoders, it is easy to train networks from scratch to solve audio-related tasks and produce measurements of the resulting performance.

NetTrain[net,data]	train the network net on the dataset data
NetChain[{layer₁,layer₂,…}]	specify a net in which the output of layer_i is connected to the input of layer_i+1
NetMeasurements[net,data,measurement]	compute the requested measurement for the net evaluated on data

Some of the neural networks functionality.

Use NetChain and NetGraph to create networks of arbitrary topology, and leverage sequence-focused layers such as GatedRecurrentLayer and LongShortTermMemoryLayer to analyze variable length signals.

Train a network to classify spoken digits from 0 to 9:

Wolfram Language code:

net = NetChain[{GatedRecurrentLayer[64], GatedRecurrentLayer[64], GatedRecurrentLayer[10], AggregationLayer[Mean, 1], SoftmaxLayer[]}, "Input" -> NetEncoder["AudioMFCC"], "Output" -> NetDecoder[{"Class", Range[0, 9]}]]

Wolfram Language code: trainedNet = NetTrain[net, ResourceData["Spoken Digit Commands"], ValidationSet -> Scaled[.05]]

Wolfram Language code: NetMeasurements[trainedNet, ResourceData["Spoken Digit Commands", "TestDataset"], {"Accuracy", "ConfusionMatrixPlot"}]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

Audio Processing

Analysis of the Whole Signal

Analysis of the Partitioned Signal

High-Level Analysis

Neural Networks

Audio Processing

Analysis of the Whole Signal

Analysis of the Partitioned Signal

High-Level Analysis

Neural Networks

Related Guides

Related Tech Notes