C Certs Club
Home
Oracle SAP Microsoft Cisco CompTIA Fortinet Salesforce Nutanix Linux Foundation Amazon View All Vendors →
Login Register

NVIDIA NCP-AII - NVIDIA Certified Professional AI Infrastructure Certification Exam

Download Exam View Entire Exam
Page: 1 / 2
Question #1 (Topic: Demo Questions)

A systems administrator is preparing a new DGX server for deployment. What is the most secure approach to configuring the BMC port during initial setup?

A.
Enable remote access to the BMC over the internet using the default admin credentials for initial troubleshooting.
B.
Connect the BMC port directly to the production network and retain default admin credentials for convenience.
C.
Leave the BMC port disconnected until after the operating system is fully configured and in production.
D.
Connect the BMC port to a dedicated and firewalled network and change the default admin credentials.
Correct Answer: D
Explanation:
The Baseboard Management Controller (BMC) is a powerful tool that allows for total control over the DGX system, including the ability to flash firmware, cycle power, and access the serial console. Because of this, it is a high-value target for security threats. The " 100% verified " secure approach (Option D) involves two critical layers:
Network Isolation: The BMC port should never be exposed to the public internet (Option A) or even the general production network (Option B). It must reside on a dedicated Out-of-Band (OOB) network that is firewalled and accessible only to authorized administrators.
Credential Management: Standard NVIDIA factory defaults (like admin/admin) must be changed immediately upon first access. As part of the DGX first-boot wizard, the system prompts the administrator to create a strong, unique password for the primary user, which is then synchronized to the BMC.
Leaving the port disconnected (Option C) is unfeasible for modern data center operations, as the BMC is required for remote monitoring and " headless " deployment. Following the isolated/firewalled approach ensures the AI Factory remains resilient against both external attacks and internal lateral movement.
Question #2 (Topic: Demo Questions)

A team is installing the NVIDIA Run:ai control plane on a Kubernetes cluster. Which two (2) options are most critical to validate before proceeding? (Pick the 2 correct responses below)

A.
Helm is installed on the installer machine.
B.
Ensure Kubernetes is running on the cluster.
C.
All cluster nodes have NVIDIA GPUs installed.
D.
NTP is disabled to simplify time synchronization.
Correct Answer: A, B
Explanation:
NVIDIA Run:ai is an advanced orchestration platform designed to optimize GPU resource allocation within Kubernetes environments. Because Run:ai is cloud-native, its control plane and worker agents are deployed as Kubernetes resources. Therefore, the absolute first prerequisite is a running Kubernetes cluster (Option B) to host the services. Secondly, Run:ai utilizes Helm, the package manager for Kubernetes, to manage its complex installation charts, deployments, and service configurations. Without Helm installed on the administrative machine (Option A), the installation scripts will fail to execute. While having GPUs (Option C) is the ultimate goal for the worker nodes, the control plane itself can be installed on a cluster before all GPU hardware is physically present. Disabling NTP (Option D) is never recommended; in fact, accurate time synchronization is vital for the TLS certificates and logging used by Run:ai and Kubernetes.
Question #3 (Topic: Demo Questions)

What is the primary purpose of performing a NeMo burn-in on a new AI infrastructure?

A.
To benchmark production training speed and ensure all GPUs are running at identical clock speeds.
B.
To stress test the hardware and software stack with representative NeMo workloads, ensuring reliability.
C.
To tune NeMo model hyperparameters for maximum accuracy on user datasets during cluster deployment.
Correct Answer: B
Explanation:
The primary purpose of a NeMo burn-in is to stress test the hardware and software stack using representative NeMo workloads before releasing the AI infrastructure to production. NeMo workloads can exercise GPU compute, GPU memory, CUDA libraries, NCCL communication, storage access, checkpointing, container runtime, scheduler integration, and distributed training behavior. This makes NeMo burn-in more realistic than simply checking that GPUs are visible or that a small synthetic benchmark runs successfully. The goal is not to tune hyperparameters for model accuracy, because burn-in validates infrastructure reliability rather than model quality. It is also not mainly about ensuring all GPUs run at identical clock speeds; clock behavior can vary based on power, thermals, workload, and GPU boost behavior. What matters is that the workload runs reliably, without stalls, NCCL failures, GPU Xid errors, storage bottlenecks, memory faults, or unstable performance. In NVIDIA AI infrastructure validation, representative workload burn-in bridges the gap between low-level diagnostics and real production training, helping detect issues that synthetic tests alone may miss.
Question #4 (Topic: Demo Questions)

If two ports must be connected, but one is SFP and one is QSFP, for example, to connect a 25 GbE HOST CHANNEL ADAPTER to a QSFP port capable of both 100 GbE and 25 GbE, which of the following solutions would best meet this requirement?

A.
SFP Connectors
B.
SFP to 1G BASE-T (RJ45) adapter
C.
QSA Adapter
Correct Answer: C
Explanation:
The QSA (QSFP to SFP Adapter) is a mechanical and electrical bridge that allows a single-lane SFP/SFP28 transceiver (typically 10G or 25G) to be plugged into a four-lane QSFP/QSFP28 switch port. In AI infrastructure, this is commonly used to connect low-speed management servers or legacy nodes to a high-speed backbone switch without wasting entire 100G/200G ports or requiring specialized breakout cables. The QSA adapter maps the single lane of the SFP module to the first lane of the QSFP port. This is a " pass-through " solution that maintains the signal integrity and latency characteristics of the link. It is the verified hardware solution for port-density mismatch in NVIDIA networking environments.
Question #5 (Topic: Demo Questions)
A system administrator needs to configure a BlueField DPU and enable RShim on the baseboard management controller (BMC). Which command should be executed?
A.
ipmitool raw 0x32 0x6a 1
B.
systemctl restart rshim
C.
systemctl enable bmc-rshim.service
D.
scp < path_to_bfb > root@ < bmc_ip > :/dev/rshim0/boot
Next Question
Correct Answer: A
Explanation:
In NVIDIA BlueField DPU architectures, the RShim (Remote-Shim) interface provides a vital communication channel between the DPU and the host or BMC, typically used for early-stage provisioning, console access, and firmware loading. While the DPU is usually managed via the host ' s PCIe bus, certain data center configurations require the DPU to be managed out-of-band via the server ' s Baseboard Management Controller (BMC). To enable this capability, a specific low-level command must be sent to the BMC to toggle the RShim functionality over the internal USB-to-BMC bridge. The command ipmitool raw 0x32 0x6a 1 is the verified raw IPMI hex code used in NVIDIA DGX and certified systems to enable the BMC-to-DPU RShim path. Once enabled, the BMC can " see " the DPU as a USB device, allowing the administrator to push a BlueField Boot (BFB) image to /dev/rshim0/boot for OS installation even if the host CPU is powered off or unresponsive. Option B and C are host-side service commands that assume the driver is already loaded and the hardware path is active, whereas the raw IPMI command is required to enable the hardware path itself.