Product Data Sheet / Brochure
NVIDIA A2 TENSOR CORE GPU | DATASHEET | 2
Higher IVA Performance for Intelligent Edge
Servers equipped with A2 offer up to 1.3X more performance in intelligent edge use cases,
including smart cities, manufacturing, and retail. NVIDIA A2 GPUs running IVA workloads
result in more efficient deployments with up to 1.6X better price-performance and ten
percent better energy efficiency than previous GPU generations.
NVIDIA A2 Brings Breakthrough NVIDIA Ampere
Architecture Innovations
THIRDENERATION TENSOR ORES
The thrd-generaton Tensor ores n A2 support nteger math,
down to INT4, and floatng pont math, up to FP32, to delver
hgh AI tranng and nference performance The NVIDIA Ampere
archtecture also supports TF32 and NVIDIA’s automatc mxed
precson (AMP) capabltes
ROOT OF TRUST SEURITY
Provdng securty n edge deployments and end-ponts s crtcal
for enterprse busness operatons A2 offers secure boot through
trusted code authentcaton and hardened rollback protectons to
protect aganst malcous malware attacks
SEONDENERATION RT ORES
A2 ncludes dedcated RT ores for ray tracng that enable
groundbreakng technologes at breakthrough speed
Wth up to 2X the throughput over the prevous generaton and
the ablty to concurrently run ray tracng wth ether shadng or
denosng capabltes
HARDWARE TRANSODIN PERFORMANE
Exponental growth n vdeo applcatons demand real-tme
scalable performance, requrng the latest n hardware encode
and decode capabltes A2 PUs use dedcated hardware to fully
accelerate vdeo decodng and encodng for the most popular
codecs, ncludng H265, H264, VP9, and AV1 decode
System onguraton PU HPE DL380 en10 Plus, 2S Xeon old 6330N
22Hz, 512B DDR4 | omputer Vson EfcentDet-D0 (OO, 512x512) |
TensorRT 82, Precson INT8, BS8 (PU) | OpenVINO 20214, Precson INT8,
BS8 (PU)
6X 10X
8X
1X
8X2X 4X
Computer Vision (EfficientDet-DO)
System onguraton PU HPE DL380 en10 Plus, 2S Xeon old 6330N
22Hz, 512B DDR4 | NLP BERT-Large (Sequence length 384, SQuAD
v11) | TensorRT 82, Precson INT8, BS1 (PU) | OpenVINO 20214,
Precson INT8, BS1 (PU)
8X
Natural Language Processing (BERT-Large)
System onguraton PU HPE DL380 en10 Plus, 2S Xeon old 6330N
22Hz, 512B DDR4 | Text-to-Speech Tacotron2 + Waveglow end-to-end
ppelne (nput length 128) | PyTorch 19, Precson FP16, BS1 (PU) | PyTorch
19, Precson FP32, BS1 (PU)
15X 20X 25X
20X
1X
5X 10X
Text-to-Speech (Tacotron2 + Waveglow)
MobileNet v2
0.0x
0.5x
1.0x
1.5x
Relative Performance (Video Streams 1080p30)
1.0X
1.2X
1.0X
1.3X
NVIDIA T4
ShuffleNet v2
NVIDIA A2
SystemConfiguration: [Supermicro SYS-1029GQ-TRT, 2S Xeon Gold 6240 2.6GHz,
512GB DDR4, 1x NVIDIA A2 OR 1x NVIDIA T4] | Measured performance with
Deepstream 5.1. Networks: ShuffleNet-v2 (224x224), MobileNet-v2 (224x224) |
Pipeline represents end-to-end performance with video capture and decode,
pre-processing, batching, inference, and post-processing.
A2 Improves Performance by Up to 1.3X Versus T4
IVA Performance (Normalized)
NVIDIA A2
40 65 70 75
TDP Operatng Range (Watts)
A2 Reduces Power Consumption by Up to
40% Versus T4
Lower Power and Configurable TDP
55 6045 50
NVIDIA T4
6X
7X
1X
2X 4X
Inference Speedup
omparsons of one NVIDIA A2 Tensor ore PU versus a
dual-socket Xeon old 6330N PU
0X
NVIDIA A2
PU
Inference Speedup
omparsons of one NVIDIA A2 Tensor ore PU versus a
dual-socket Xeon old 6330N PU
0X
NVIDIA A2
PU
Inference Speedup
omparsons of one NVIDIA A2 Tensor ore PU versus a
dual-socket Xeon old 6330N PU
0X
NVIDIA A2
PU