Product Data Sheet / Brochure

NVIDIA A2 TENSOR CORE GPU | DATASHEET | 2
Higher IVA Performance for Intelligent Edge
Servers equipped with A2 offer up to 1.3X more performance in intelligent edge use cases,
including smart cities, manufacturing, and retail. NVIDIA A2 GPUs running IVA workloads
result in more efcient deployments with up to 1.6X better price-performance and ten
percent better energy efficiency than previous GPU generations.
NVIDIA A2 Brings Breakthrough NVIDIA Ampere
Architecture Innovations
THIRDENERATION TENSOR ORES
The thrd-generaton Tensor ores n A2 support nteger math,
down to INT4, and floatng pont math, up to FP32, to delver
hgh AI tranng and nference performance The NVIDIA Ampere
archtecture also supports TF32 and NVIDIAs automatc mxed
precson (AMP) capabltes
ROOT OF TRUST SEURITY
Provdng securty n edge deployments and end-ponts s crtcal
for enterprse busness operatons A2 offers secure boot through
trusted code authentcaton and hardened rollback protectons to
protect aganst malcous malware attacks
SEONDENERATION RT ORES
A2 ncludes dedcated RT ores for ray tracng that enable
groundbreakng technologes at breakthrough speed
Wth up to 2X the throughput over the prevous generaton and
the ablty to concurrently run ray tracng wth ether shadng or
denosng capabltes
HARDWARE TRANSODIN PERFORMANE
Exponental growth n vdeo applcatons demand real-tme
scalable performance, requrng the latest n hardware encode
and decode capabltes A2 PUs use dedcated hardware to fully
accelerate vdeo decodng and encodng for the most popular
codecs, ncludng H265, H264, VP9, and AV1 decode
System onguratonPUHPE DL380 en10 Plus, 2S Xeon old 6330N
22Hz, 512B DDR4 | omputer VsonEfcentDet-D0 (OO, 512x512) |
TensorRT 82, PrecsonINT8, BS8 (PU) | OpenVINO 20214, PrecsonINT8,
BS8 (PU)
6X 10X
8X
1X
8X2X 4X
Computer Vision (EfficientDet-DO)
System onguratonPUHPE DL380 en10 Plus, 2S Xeon old 6330N
22Hz, 512B DDR4 | NLP BERT-Large (Sequence length384, SQuAD
v11) | TensorRT 82, PrecsonINT8, BS1 (PU) | OpenVINO 20214,
PrecsonINT8, BS1 (PU)
8X
Natural Language Processing (BERT-Large)
System onguratonPUHPE DL380 en10 Plus, 2S Xeon old 6330N
22Hz, 512B DDR4 | Text-to-SpeechTacotron2 + Waveglow end-to-end
ppelne (nput length 128) | PyTorch 19, PrecsonFP16, BS1 (PU) | PyTorch
19, PrecsonFP32, BS1 (PU)
15X 20X 25X
20X
1X
5X 10X
Text-to-Speech (Tacotron2 + Waveglow)
MobileNet v2
0.0x
0.5x
1.0x
1.5x
Relative Performance (Video Streams 1080p30)
1.0X
1.2X
1.0X
1.3X
NVIDIA T4
ShuffleNet v2
NVIDIA A2
SystemConfiguration: [Supermicro SYS-1029GQ-TRT, 2S Xeon Gold 6240 2.6GHz,
512GB DDR4, 1x NVIDIA A2 OR 1x NVIDIA T4] | Measured performance with
Deepstream 5.1. Networks: ShuffleNet-v2 (224x224), MobileNet-v2 (224x224) |
Pipeline represents end-to-end performance with video capture and decode,
pre-processing, batching, inference, and post-processing.
A2 Improves Performance by Up to 1.3X Versus T4
IVA Performance (Normalized)
NVIDIA A2
40 65 70 75
TDP Operatng Range (Watts)
A2 Reduces Power Consumption by Up to
40% Versus T4
Lower Power and Configurable TDP
55 6045 50
NVIDIA T4
6X
7X
1X
2X 4X
Inference Speedup
omparsons of one NVIDIA A2 Tensor ore PU versus a
dual-socket Xeon old 6330N PU
0X
NVIDIA A2
PU
Inference Speedup
omparsons of one NVIDIA A2 Tensor ore PU versus a
dual-socket Xeon old 6330N PU
0X
NVIDIA A2
PU
Inference Speedup
omparsons of one NVIDIA A2 Tensor ore PU versus a
dual-socket Xeon old 6330N PU
0X
NVIDIA A2
PU