C Certs Club
Home
Oracle SAP Microsoft Cisco CompTIA Fortinet Salesforce Nutanix Linux Foundation Amazon View All Vendors →
Login Register

NVIDIA NCP-AII - NVIDIA Certified Professional AI Infrastructure Certification Exam

Download Exam View Entire Exam
Page: 2 / 2
Question #6 (Topic: Demo Questions)

An engineer needs to verify the current firmware versions of all components (ATF, BSP, NIC, UEFI) on a BlueField-3 DPU ' s BMC. Which Redfish API command provides this information?

A.
mlxconfig -d < dev > q
B.
curl -k -u root: < password > -X GET https:// < DPU-BMC-IP > /redfish/v1/UpdateService/FirmwareList
C.
mstflint -d < PCI_ID > query full
D.
curl -k -u root: < password > -X GET https:// < DPU-BMC-IP > /redfish/v1/UpdateService/FirmwareInventory
Correct Answer: D
Explanation:
Modern NVIDIA BlueField DPUs include an integrated Baseboard Management Controller (BMC) that supports the industry-standard Redfish API for out-of-band management. While CLI tools like mlxconfig (Option A) or mstflint (Option C) can be used from the host OS to check the NIC firmware, they cannot easily query the BMC-specific components like the ARM Trusted Firmware (ATF), the Board Support Package (BSP), or the UEFI bootloader of the DPU. The Redfish standard specifies a common URI for hardware inventory. The Firmware Inventory endpoint (Option D) is the correct RESTful path to retrieve a comprehensive JSON object containing the versioning details for all firmware-controllable components on the DPU. This is the preferred method for automated data center management systems (like NVIDIA Base Command Manager) to verify that DPUs are at the correct " Golden Image " version during the staging phase. Note that " Firmware List " (Option B) is not a standard Redfish URI for this specific data.
Question #7 (Topic: Demo Questions)

For a 48-hour NCCL burn-in test, which parameters ensure sustained fabric stress while detecting silent data corruption?

A.
broadcast_perf -b 4G -e 16G -w 160
B.
all_reduce_perf -b 8G -e 32G -c 1000 -z 1 -G 1000
C.
all_reduce_perf -b 8G -e 32G -z 1 -G 1000
D.
reduce_scatter_perf -f 2 -g 8
Correct Answer: B
Explanation:
The NVIDIA Collective Communications Library (NCCL) tests are the gold standard for validating the interconnect performance of a GPU cluster. For a long-duration burn-in (48 hours), the goal is not just to measure peak bandwidth, but to stress the fabric under load to catch intermittent hardware failures or " Silent Data Corruption " (SDC). The all_reduce_perf test is the most intensive as it involves bidirectional data flow across all GPUs. The specific parameters in Option B are critical: -b 8G -e 32G sets the message size range to large buffers that saturate the 400G InfiniBand links; -c 1000 ensures a high number of iterations for statistical significance; -z 1 (check) is the most vital flag, as it enables verification of the mathematical result. If a bit flips during transmission due to a faulty transceiver, the -z 1 flag will catch the mismatch and report a failure. Finally, -G 1000 ensures the test runs long enough to reach thermal equilibrium across the switches and HCAs.
Download Exam
« Prev Page: 2 / 2
Next Page