UFM Telemetry 平台可提供网络验证工具,监控网络性能和状况,同时还能捕获丰富的实时网络遥测信息、应用工作负载使用情况以及系统配置,并将其流式传输至本地或基于云的数据库,以便进一步分析。
主要特性:
• 交换机、适配器和线缆遥测
• 系统验证
• 网络性能测试
• 将遥测信息流式传输到用户自建的或云上的数据库
NVIDIA UNIFIED FABRIC
MANAGER (UFM) PORTFOLIO
AI-Powered Cyber Intelligence and Analytics Platforms
Data centers host many users and applications and have become the competitive advantage for research organizations and manufacturing companies. Keeping the data center intact and healthy is critical—a data center shutdown can mean the loss of millions of dollars. What’s more, malicious users often exploit data center access to misuse compute resources such as by running prohibited applications, resulting in higher operating costs.
NVIDIA® UFM® platforms revolutionize InfiniBand network management. By combining enhanced and real-time network telemetry with AI-powered cyber intelligence and analytics, the UFM platforms empower you to discover operation anomalies and predict network failures for preventive maintenance. UFM platforms comprise multiple levels of solutions and capabilities to suit yourdata center’s needs and requirements. At the basic level, the UFM Telemetry platform provides network validation tools, and monitors the network performance and conditions. It captures, for example, rich real-time network telemetry information, and workload usage data and system configuration, and streams it to a defined on-premises or cloud-based database for further analysis.
The mid-tier UFM Enterprise platform adds enhanced network monitoring, management, workload optimizations and periodic configuration checks. In addition to including all of the UFM Telemetry services, it provides network setup, connectivity validation, and secure cable management, automated network discovery and network provisioning, traffic monitoring, and congestion discovery. UFM Enterprise also enables job scheduler provisioning and integration with Slurm and Platform LSF, in addition to network provisioning and integration with OpenStack, Azure Cloud and VMware.
The enhanced UFM Cyber-AI platform includes all of the UFM Telemetry and UFM Enterprise services. The unique advantages of the Cyber-AI platform are based on capturing rich InfiniBand telemetry information over time and utilizing deep learning algorithms. The platform learns the data center’s “heartbeat,” operation mode, conditions, usage, and workload network signatures. It builds an enhanced database of telemetry information and discovers correlations between events. It detects performance degradations, usage and profile changes over time, and alerts to abnormal system and application behavior, and potential system failures. The Cyber-AI platform can also perform corrective actions.
In addition to detecting past and current events, the Cyber-AI platform can indicate future performance degradations or abnormal usage of the data center computing resources, by translating and correlating changes in the data center heartbeat. Such changes and correlations trigger the performing of predictive analytics, and initiate alerts that indicate abnormal system and application behavior, as well as potential system failures. System administrators can quickly detect and respond to such potential security threats, and address upcoming failures in an efficient manner, saving OPEX and maintaining end-user SLAs. Predictability is optimized over time with the collection of additional system data.
UFM ENTERPRISE 可视化面板
UFM各版本特征