NVIDIA NCP-AIO - NVIDIA Certified Professional AI Operations Certification Exam
Question #1 (Topic: Demo Questions)
A Slurm user needs to submit a batch job script for execution tomorrow. Which command should be used to complete this task?
Correct Answer: A
Explanation:
Comprehensive and Detailed Explanation From Exact Extract: In Slurm cluster administration, the command to submit a batch job script is sbatch. This command schedules the job to be executed by the Slurm workload manager. The option -begin=tomorrow (or -- begin=tomorrow) specifies the start time for the job execution, which in this case is set for tomorrow. The other commands have different purposes: submit is not a valid Slurm command. salloc is used to allocate resources interactively but does not submit batch jobs for scheduled execution. srun runs jobs immediately on allocated resources but is typically used to launch tasks in an active job or interactively, not for batch job submission. Therefore, the correct command to submit a batch job script for future execution is sbatch - begin=tomorrow.
Question #2 (Topic: Demo Questions)
A system administrator needs to optimize the delivery of their AI applications to the edge. What NVIDIA platform should be used?
Correct Answer: C
Explanation:
NVIDIA Fleet Command is the platform designed specifically to optimize and manage the deployment and delivery of AI applications at the edge. It enables secure and scalable orchestration of AI workloads across distributed edge devices, providing lifecycle management, remote monitoring, and updates. Fleet Command facilitates running AI applications closer to where data is generated (edge), improving latency and operational efficiency. Base Command Platform and Base Command Manager primarily target data center and AI cluster management for configuration, monitoring, and troubleshooting. NetQ is focused on network telemetry and network state monitoring rather than application delivery. Therefore, for AI application delivery and optimization at the edge, Fleet Command is the recommended NVIDIA platform.
Question #3 (Topic: Demo Questions)
You are a Solutions Architect designing a data center infrastructure for a cloud-based AI application that requires high-performance networking, storage, and security. You need to choose a software framework to program the NVIDIA BlueField DPUs that will be used in the infrastructure. The framework must support the development of custom applications and services, as well as enable tailored solutions for specific workloads. Additionally, the framework should allow for the integration of storage services such as NVMe over Fabrics (NVMe-oF) and elastic block storage. Which framework should you choose?
Correct Answer: D
Explanation:
Comprehensive and Detailed Explanation From Exact Extract: NVIDIA DOCA (Data Center Infrastructure-on-a-Chip Architecture) is the software framework designed to program NVIDIA BlueField DPUs (Data Processing Units). DOCA provides libraries, APIs, and tools to develop custom applications, enabling users to of load, accelerate, and secure data center infrastructure functions on BlueField DPUs. DOCA supports integration with key data center services including storage protocols such as NVMe over Fabrics(NVMe-oF), elastic block storage, and network security and telemetry. It enables tailored solutions optimized for specific workloads and high-performance infrastructure demands. TensorRT is focused on AI inference optimization. CUDA is NVIDIA’s GPU programming model for general-purpose GPU computing, not for DPUs. NSight is a development environment for debugging and profiling NVIDIA GPUs. Therefore, NVIDIA DOCA is the correct framework for programming BlueField DPUsin a data center environment requiring custom application development and advanced storage/networking
Question #4 (Topic: Demo Questions)
A system administrator needs to collect the information below:
GPU behavior monitoring
GPU configuration management
GPU policy oversight
GPU health and diagnostics
GPU accounting and process statistics
NVSwitch configuration and monitoring
What single tool should be used?
Correct Answer: C
Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
The NVIDIA Data Center GPU Manager (DCGM) is the comprehensive management tool that provides all the requested functionalities: monitoring GPU behavior, managing configurations, enforcing policies, health diagnostics, process accounting, and NVSwitch monitoring. DCGM is designed for large-scale GPU management in data centers and AI clusters, providing detailed telemetry and control over NVIDIA GPUs and NVSwitches.
nvidia-smi provides GPU monitoring but lacks full policy and NVSwitch management.
CUDA Toolkit is for GPU programming and development.
Nsight Systems is focused on performance profiling and debugging.
Therefore, DCGM is the single tool that meets all the listed requirements.
Question #5 (Topic: Demo Questions)
A system administrator notices that jobs are failing intermittently on Base Command Manager due to incorrect GPU configurations in Slurm. The administrator needs to ensure that jobs utilize GPUs correctly.
How should they troubleshoot this issue?
Correct Answer: B
Explanation: