DGX-A100

The CVBLab research group, belonging to the I3B at UPV, improved its facilities in 2020 by the provision of a high-performance and high-computing-power system with a fully integrated and hardware-optimized deep-learning-oriented software.

This equipment has been funded by the European Union within the operating Program ERDF of the Valencian Community 2014-2020 with the grant number IDIFEDER/2020/030. These grants were published in the DOGV number 8596 in 2019, Jul, 22nd.

With this grant, a specific infrastructure for artificial intelligence (AI) was purchased, specifically the NVIDIA DGXA100 system. Such a system is optimized to carry out all AI workloads, from analytics to training and inference machine-learning processes. DGX A100 offers about 5PFLOPs of computing power with multi-instance GPU capability in a single system, allowing resource assignment for specific workloads and ensuring that the largest and most complex jobs are supported, along with the simplest and smallest.

The main goal of this equipment is to be used in the field of digital pathology. Specifically, to design, develop and apply deep learning techniques, based on convolutional neural networks and transformers, on digital histological images. The large size of the images, on the order of one gigabyte, along with the fact that deep learning techniques require the optimization of parameters in the neural-network design justify the need for a hardware and software system with high computing power. All the hardware power must be used by a software that integrates all the components in the most efficient way to guarantee the optimal flow of data computing. The software must facilitate the design and optimization of neural networks according to the problem to be solved, as well as the data import, data curation, visualization, etc. Until now, the Universitat Politècnica de València did not have such powerful hardware as the one acquired thanks to this infrastructure grant from the Generalitat Valenciana and very few national and international centers have it. This will grow and position the university as a benchmark in Artificial Intelligence. The acquisition of a system of these characteristics will be vital to successfully undertake the objectives of the research group and especially the European project that they are currently coordinating (CLARIFY, 860627), but also to undertake other projects in which we participate with greater guarantees.

SYSTEM SPECIFICATIONS

HARDWARE

  •  CPU: Dual AMD Rome 7742, 128 total cores, 2.25 GHz (base), 3.4 GHz (max boost).
  • System memory: 1 TB.
  • GPUs: 8x NVIDIA A100 Tensor Core GPUs, 320GB GPU memory, Multi Instance GPU mode.
  • Storage: OS: 2x 1.92TB M.2 NVME drives, Internal Storage: 15TB (4x 3.84TB) U.2 NVME drives.
  • Networking: 8x Single-Port 200Gb / s HDR InfiniBand and 1x Dual-Port. Infiniband connectivity guarantees 450Gb / s bidirectional bandwidth.
  • Performance: 5 petaFLOPS AI, 10 petaOPS INT8.
  • System Power Usage: 6.5kW max.
  • System weight: 123 kgs.

SOFTWARE

  • DGX OS 5 (Ubuntu Linux OS).
  • Software for management, monitoring and planning.
  • Optimized environment based on dockers for Machine Learning workloads, in particular for Deep Learning.

Agency

Generalitat Valenciana / European Union through the European Regional Development Fund (ERDF) of the Valencian Community (IDIFEDER/2020/030)

Years

2020 to 2021

QUIERES MÁS INFORMACIÓN?