Addressing Serviceability throughout device lifecycle with High Speed Access for Test
Amit Pandey
Amazon
Austin, USA
[email protected]
Brendan Tully
Amazon
Austin, USA
[email protected]
Karthikeyan Natarajan
Synopsys
Sunnyvale, California
[email protected]
Abstract
The design sizes and complexity of modern large scale SoCs continue to grow exponentially especially for the cutting-edge chips that are used in Artificial Intelligence (AI) applications. With the rapid growth in semiconductor complexity, higher expectations for SoC performance and longevity, there is a need to continuously monitor the device throughout its life cycle to maximize performance and identify defects before they impact system operations. In order to diagnose and potentially repair such failing devices while still in system, a variety of testing methodologies have been used in the industry. SCAN vectors when reused for Silicon Lifecycle Management (SLM) are effective at targeting specific parts of the devices and ease up the diagnosis process. These vectors along with BIST and miscellaneous sensors embedded into the silicon device allows the servers in the datacenter to be remotely monitored and repaired before they can impact functional system operations. However, the volume of such data generally tends to be very large which make it difficult to be applied in-system without a robust and high-speed network access mechanism. The High-Speed IO access mechanism provides plenty of bandwidth as the native protocol of these interfaces are used. In this presentation we will provide an overview of the High-Speed IO access solution which enables SCAN Vectors to be applied In-Field for an Amazon Web Services (AWS) Machine Learning (ML) and Artificial Intelligence (AI) Acceleration system. We will show how an embedded microcontroller sits at the heart of such a system and allows for remote monitoring and servicing. Examples of how High-speed access architecture can be applied to structures like BIST, JTAG, redundancy/repair, PVT sensors and monitors to address datacenter serviceability needs will also be presented.
Keywords
Silicon Lifecycle Management (SLM), Scan, Automatic Test Pattern Generation (ATPG), High-Speed I/O, Functional Protocol, System Level Test (SLT), In-System Test (IST), Built-in Self-Test (BIST), Sensors
Dr. Zane A. Ball is a Corporate Vice President and General Manager of the Data Center and AI (DCAI) Product Management Group. DCAI Product Management is responsible for end-to-end stewardship of DCAI’s systems, SW, CPU, GPU, and custom product line through the entirety of the product lifecycle. Prior to his product management role, Ball was CVP and GM of platform engineering and architecture for Intel’s data center business. Ball has also served as Co-GM of Intel’s foundry effort as a VP in the Technology and Manufacturing group and VP of the Client Computing Group including roles as GM of the desktop client business and as GM of global customer engineering.
Ball has a bachelor’s degree, master’s degree, and Ph.D. in electrical engineering, all earned from Rice University. He also holds six patents in high-speed electrical design.