{"id":225,"date":"2024-05-28T02:39:38","date_gmt":"2024-05-28T02:39:38","guid":{"rendered":"https:\/\/ieee-ras.conferences.computer.org\/2024\/?page_id=225"},"modified":"2024-05-28T02:40:59","modified_gmt":"2024-05-28T02:40:59","slug":"invited_talk_rama_bhimanadhuni_abstract","status":"publish","type":"page","link":"https:\/\/ieee-ras.conferences.computer.org\/2024\/invited_talk_rama_bhimanadhuni_abstract\/","title":{"rendered":"Invited_talk_Rama_Bhimanadhuni_Abstract"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-page\" data-elementor-id=\"225\" class=\"elementor elementor-225\" data-elementor-post-type=\"page\">\n\t\t\t\t<div class=\"elementor-element elementor-element-40cfc98 e-flex e-con-boxed e-con e-parent\" data-id=\"40cfc98\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-83cd2b2 elementor-widget elementor-widget-text-editor\" data-id=\"83cd2b2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"x_MsoNormal\"><b>Title<\/b>:\u00a0<span class=\"x_ui-provider\">Enabling Generative AI- Exploring RAS Requirements for Hyperscale AI Infrastructure<\/span><\/p><p><b>Speaker<\/b>: Rama Bhimanadhuni<\/p><p class=\"x_MsoNormal\"><span class=\"x_ui-provider\">Generative AI workloads have led to a fast increase of GPUs and accelerators in Cloud Data Centers at hyperscale. AI workloads are evolving swiftly, which creates more demand for hardware resources such as computational power, memory, networking, and high-speed interconnects. However, at the scale of AI supercomputers, hardware failure rates are also increasing across these resources, requiring RAS technology innovations spanning across Silicon, Server, Firmware, Software, Rack, and Fleet. Based on insights from Hyperscale AI infrastructure fleets, this technical talk will explain the importance of RAS requirements for reducing job disruptions, ensuring Hardware Error Resilience, improving Maintenance and Serviceability, enabling Root Cause Analysis and Failure Prediction. Moreover, the session will showcase how RAS standardization across GPUs and Accelerators has been attempted recently across the industry through the OCP Hardware management workstream.<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Title:\u00a0Enabling Generative AI- Exploring RAS Requirements for Hyperscale AI Infrastructure Speaker: Rama Bhimanadhuni Generative AI workloads have led to a fast increase of GPUs and accelerators in Cloud Data Centers at hyperscale. AI workloads are evolving swiftly, which creates more demand for hardware resources such as computational power, memory, networking, and high-speed interconnects. However, at [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"elementor_canvas","meta":{"footnotes":""},"class_list":["post-225","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ieee-ras.conferences.computer.org\/2024\/wp-json\/wp\/v2\/pages\/225","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ieee-ras.conferences.computer.org\/2024\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ieee-ras.conferences.computer.org\/2024\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ieee-ras.conferences.computer.org\/2024\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/ieee-ras.conferences.computer.org\/2024\/wp-json\/wp\/v2\/comments?post=225"}],"version-history":[{"count":0,"href":"https:\/\/ieee-ras.conferences.computer.org\/2024\/wp-json\/wp\/v2\/pages\/225\/revisions"}],"wp:attachment":[{"href":"https:\/\/ieee-ras.conferences.computer.org\/2024\/wp-json\/wp\/v2\/media?parent=225"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}