UFM Cyber-AI 平台可增强 UFM Telemetry 和 UFM Enterprise 的优势,提供预防性维护和网络安全,从而降低超级计算运营支出。
主要特性:
• 包含 UFM Telemetry 和 UFM Enterprise 的功能
• 分析随时间推移的性能退化或应用模式特征
• 检测异常集群行为
• 使用 AI 建立现象之间的相关性(可能看似不相干)
• 报告预防性维护的警报
• 借助持续的系统数据采集,优化可预测性
NVIDIA UNIFIED FABRIC
MANAGER (UFM) PORTFOLIO
AI-Powered Cyber Intelligence and Analytics Platforms
Data centers host many users and applications and have become the competitive advantage for research organizations and manufacturing companies. Keeping the data center intact and healthy is critical—a data center shutdown can mean the loss of millions of dollars. What’s more, malicious users often exploit data center access to misuse compute resources such as by running prohibited applications, resulting in higher operating costs.
NVIDIA® UFM® platforms revolutionize InfiniBand network management. By combining enhanced and real-time network telemetry with AI-powered cyber intelligence and analytics, the UFM platforms empower you to discover operation anomalies and predict network failures for preventive maintenance. UFM platforms comprise multiple levels of solutions and capabilities to suit yourdata center’s needs and requirements. At the basic level, the UFM Telemetry platform provides network validation tools, and monitors the network performance and conditions. It captures, for example, rich real-time network telemetry information, and workload usage data and system configuration, and streams it to a defined on-premises or cloud-based database for further analysis.
The mid-tier UFM Enterprise platform adds enhanced network monitoring, management, workload optimizations and periodic configuration checks. In addition to including all of the UFM Telemetry services, it provides network setup, connectivity validation, and secure cable management, automated network discovery and network provisioning, traffic monitoring, and congestion discovery. UFM Enterprise also enables job scheduler provisioning and integration with Slurm and Platform LSF, in addition to network provisioning and integration with OpenStack, Azure Cloud and VMware.
The enhanced UFM Cyber-AI platform includes all of the UFM Telemetry and UFM Enterprise services. The unique advantages of the Cyber-AI platform are based on capturing rich InfiniBand telemetry information over time and utilizing deep learning algorithms. The platform learns the data center’s “heartbeat,” operation mode, conditions, usage, and workload network signatures. It builds an enhanced database of telemetry information and discovers correlations between events. It detects performance degradations, usage and profile changes over time, and alerts to abnormal system and application behavior, and potential system failures. The Cyber-AI platform can also perform corrective actions.
In addition to detecting past and current events, the Cyber-AI platform can indicate future performance degradations or abnormal usage of the data center computing resources, by translating and correlating changes in the data center heartbeat. Such changes and correlations trigger the performing of predictive analytics, and initiate alerts that indicate abnormal system and application behavior, as well as potential system failures. System administrators can quickly detect and respond to such potential security threats, and address upcoming failures in an efficient manner, saving OPEX and maintaining end-user SLAs. Predictability is optimized over time with the collection of additional system data.
UFM ENTERPRISE 可视化面板
UFM各版本特征