NCP-AIO NVIDIA Practice Questions

Question #1 (Topic: Demo Questions)

A Slurm user needs to submit a batch job script for execution tomorrow. Which command should be used to complete this task?

A.

sbatch -begin=tomorrow

B.

submit -begin=tomorrow

C.

salloc -begin=tomorrow

D.

srun -begin=tomorrow

Correct Answer: A

Explanation:

Comprehensive and Detailed Explanation From Exact Extract: In Slurm cluster administration, the command to submit a batch job script is sbatch. This command schedules the job to be executed by the Slurm workload manager. The option -begin=tomorrow (or -- begin=tomorrow) specifies the start time for the job execution, which in this case is set for tomorrow. The other commands have different purposes: submit is not a valid Slurm command. salloc is used to allocate resources interactively but does not submit batch jobs for scheduled execution. srun runs jobs immediately on allocated resources but is typically used to launch tasks in an active job or interactively, not for batch job submission. Therefore, the correct command to submit a batch job script for future execution is sbatch - begin=tomorrow.

Question #2 (Topic: Demo Questions)

A system administrator needs to optimize the delivery of their AI applications to the edge. What NVIDIA platform should be used?

A.

Base Command Platform

B.

Base Command Manager

C.

Fleet Command

D.

NetQ

Correct Answer: C

Explanation:

NVIDIA Fleet Command is the platform designed specifically to optimize and manage the deployment and delivery of AI applications at the edge. It enables secure and scalable orchestration of AI workloads across distributed edge devices, providing lifecycle management, remote monitoring, and updates. Fleet Command facilitates running AI applications closer to where data is generated (edge), improving latency and operational efficiency. Base Command Platform and Base Command Manager primarily target data center and AI cluster management for configuration, monitoring, and troubleshooting. NetQ is focused on network telemetry and network state monitoring rather than application delivery. Therefore, for AI application delivery and optimization at the edge, Fleet Command is the recommended NVIDIA platform.

Question #3 (Topic: Demo Questions)

You are a Solutions Architect designing a data center infrastructure for a cloud-based AI application that requires high-performance networking, storage, and security. You need to choose a software framework to program the NVIDIA BlueField DPUs that will be used in the infrastructure. The framework must support the development of custom applications and services, as well as enable tailored solutions for specific workloads. Additionally, the framework should allow for the integration of storage services such as NVMe over Fabrics (NVMe-oF) and elastic block storage. Which framework should you choose?

A.

NVIDIA TensorRT

B.

NVIDIA CUDA

C.

NVIDIA NSight

D.

NVIDIA DOCA

Correct Answer: D

Explanation:

Comprehensive and Detailed Explanation From Exact Extract: NVIDIA DOCA (Data Center Infrastructure-on-a-Chip Architecture) is the software framework designed to program NVIDIA BlueField DPUs (Data Processing Units). DOCA provides libraries, APIs, and tools to develop custom applications, enabling users to of load, accelerate, and secure data center infrastructure functions on BlueField DPUs. DOCA supports integration with key data center services including storage protocols such as NVMe over Fabrics(NVMe-oF), elastic block storage, and network security and telemetry. It enables tailored solutions optimized for specific workloads and high-performance infrastructure demands. TensorRT is focused on AI inference optimization. CUDA is NVIDIA’s GPU programming model for general-purpose GPU computing, not for DPUs. NSight is a development environment for debugging and profiling NVIDIA GPUs. Therefore, NVIDIA DOCA is the correct framework for programming BlueField DPUsin a data center environment requiring custom application development and advanced storage/networking

Question #4 (Topic: Demo Questions)

A system administrator needs to collect the information below:
GPU behavior monitoring
GPU configuration management
GPU policy oversight
GPU health and diagnostics
GPU accounting and process statistics
NVSwitch configuration and monitoring
What single tool should be used?

A.

nvidia-smi

B.

CUDA Toolkit

C.

DCGM

D.

Nsight Systems

Correct Answer: C

Explanation:

Comprehensive and Detailed Explanation From Exact Extract:

The NVIDIA Data Center GPU Manager (DCGM) is the comprehensive management tool that provides all the requested functionalities: monitoring GPU behavior, managing configurations, enforcing policies, health diagnostics, process accounting, and NVSwitch monitoring. DCGM is designed for large-scale GPU management in data centers and AI clusters, providing detailed telemetry and control over NVIDIA GPUs and NVSwitches.
nvidia-smi provides GPU monitoring but lacks full policy and NVSwitch management.
CUDA Toolkit is for GPU programming and development.
Nsight Systems is focused on performance profiling and debugging.
Therefore, DCGM is the single tool that meets all the listed requirements.

Question #5 (Topic: Demo Questions)

A system administrator notices that jobs are failing intermittently on Base Command Manager due to incorrect GPU configurations in Slurm. The administrator needs to ensure that jobs utilize GPUs correctly.
How should they troubleshoot this issue?

A.

Increase the number of GPUs requested in the job script to avoid using unconfigured GPUs.

B.

Check if MIG (Multi-Instance GPU) mode has been enabled incorrectly and reconfigure Slurm accordingly.

C.

Verify that non-MIG GPUs are automatically configured in Slurm when detected, and adjust configurations if needed.

D.

Ensure that GPU resource limits have been correctly defined in Slurm’s configuration file for each job type.

Next Question

Correct Answer: B

Explanation:

Comprehensive and Detailed Explanation From Exact Extract:

Misconfiguration related to MIG mode can cause Slurm to improperly allocate GPUs, leading to job failures. The administrator should verify whether MIG has been enabled on the GPUs and ensure that Slurm’s configuration matches the hardware setup. If MIG is enabled, Slurm must be configured to recognize and schedule MIG partitions correctly to avoid resource conflicts.

NVIDIA NCP-AIO - NVIDIA Certified Professional AI Operations Certification Exam

A Slurm user needs to submit a batch job script for execution tomorrow. Which command should be used to complete this task?

sbatch -begin=tomorrow

submit -begin=tomorrow

salloc -begin=tomorrow

srun -begin=tomorrow

Correct Answer: A

A system administrator needs to optimize the delivery of their AI applications to the edge. What NVIDIA platform should be used?

Base Command Platform

Base Command Manager

Fleet Command

NetQ

Correct Answer: C

NVIDIA TensorRT

NVIDIA CUDA

NVIDIA NSight

NVIDIA DOCA

Correct Answer: D

A system administrator needs to collect the information below:GPU behavior monitoringGPU configuration managementGPU policy oversightGPU health and diagnosticsGPU accounting and process statisticsNVSwitch configuration and monitoringWhat single tool should be used?

nvidia-smi

CUDA Toolkit

DCGM

Nsight Systems

Correct Answer: C

Comprehensive and Detailed Explanation From Exact Extract:

A system administrator notices that jobs are failing intermittently on Base Command Manager due to incorrect GPU configurations in Slurm. The administrator needs to ensure that jobs utilize GPUs correctly.How should they troubleshoot this issue?

Increase the number of GPUs requested in the job script to avoid using unconfigured GPUs.

Check if MIG (Multi-Instance GPU) mode has been enabled incorrectly and reconfigure Slurm accordingly.

Verify that non-MIG GPUs are automatically configured in Slurm when detected, and adjust configurations if needed.

Ensure that GPU resource limits have been correctly defined in Slurm’s configuration file for each job type.

Correct Answer: B

Comprehensive and Detailed Explanation From Exact Extract:

A system administrator needs to collect the information below:
GPU behavior monitoring
GPU configuration management
GPU policy oversight
GPU health and diagnostics
GPU accounting and process statistics
NVSwitch configuration and monitoring
What single tool should be used?

A system administrator notices that jobs are failing intermittently on Base Command Manager due to incorrect GPU configurations in Slurm. The administrator needs to ensure that jobs utilize GPUs correctly.
How should they troubleshoot this issue?