Teruo Utsumi has over twenty years experience in computer hardware and is an expert in high performance computer architectures and specializes in the optimization of the hardware implementation of algorithms. His extensive experience in computer system design spans all levels of system and subsystem architecture, analysis of the interactions between hardware and software, RAS (reliability, availability, and serviceabilility), and testing, as well as logic and physical design of semiconductor chips.
Before starting Mugen Computing, he spent twelve years at SGI in Mountain View and Sunnyvale, CA. His work there included logic and physical design of ASICs and FPGAs for the SGI Origin and Altix shared memory supercomputers, widely acknowledged to be the most usable large memory computer system in the industry. Other significant contributions include subsystem architecture and coordination of cross-geography hardware and software teams.
The last position he held at SGI was the role of Technical Lead Engineer for its FPGA-based Reconfigurable Application Specific Computing (RASC) product line. Over the past several years RASC has been widely regarded to be the best high performance reconfigurable computing platform in the industry. His primary contributions to RASC include the architecture and implementation of several major feature enhancements and coordinating design efforts between the hardware and software teams. He is known for his innovative hardware architectures, ability to find optimizations for new and existing implementations, and knowledge of how to exploit the various capabilities of FPGAs for various application fields such as DSP. Considered to be the company expert on FPGA design and all aspects of hardware for reconfigurable computing, he consulted for internal and external organizations on the usage of RASC and the design and optimization of a variety of algorithms.
From 1988 to 1997, he worked at Fujitsu Ltd. in Kawasaki, Japan. His work there started with what latter became known as Fujitsu’s VPP supercomputer product line, becoming the third person to join the hardware team. This led to the development of the VPP500 vector parallel supercomputer and the installation of the Numerical Wind Tunnel (NWT) computer at Japan’s National Aerospace Laboratory. As noted in the supercomputing Top500 list, the NWT held the rank as the world’s fastest supercomputer for two years from 1993 to 1995 and a top three ranking during 1993 to 1997. The NWT also received recognition for significant scientific work. Three applications that ran on the NWT were awarded the Gordon Bell Prize on three occasions, Honorable Mention for Performance in 1994 and First Prize for Performance in 1995 and 1996.
During the initial design phase he worked with a small team to set and refine design parameters for the VPP product line that support key aspects of parallel processing. He was a major contributor to the development of terminology and concepts that led to the Interprocessor Communication Protocol and numerous patents, including 11 with the U.S. Patent Office.
In the development phase he designed several chips and developed the microarchitecture for the low-end, 32 processor implementation of the crossbar interprocessor network. For the second generation VPP700 he developed highly innovative implementation and testing schemes for the crossbar network that provided scaling up to 256 processors and reduced its maximum size by a factor of four compared to the network designed for the first generation VPP500.
His professional interests include high performance computing and the advancement of reconfigurable computing into mainstream computing.
Co-inventor of following patents:
Pat. No. Title
5,896,501 Multiprocessor system and parallel processing method for processing data transferred between processors
5,822,785 Data transfer using local and global address translation and authorization
5,664,104 Transfer processor including a plurality of failure display units wherein a transfer process is prohibited if failure is indicated in a failure display unit
5,652,905 Data processing unit
5,634,071 Synchronous processing method and apparatus for a plurality of processors executing a plurality of programs in parallel
5,625,846 Transfer request queue control system using flags to indicate transfer request queue validity and whether to use round-robin system for dequeuing the corresponding queues
5,623,688 Parallel processing system including instruction processor to execute instructions and transfer processor to transfer data for each user program
5,592,680 Abnormal packet processing system
5,592,628 Data communication system which guarantees at a transmission station the arrival of transmitted data to a receiving station and method thereof
5,572,680 Method and apparatus for processing and transferring data to processor and/or respective virtual processor corresponding to destination logical processor number
5,557,744 Multiprocessor system including a transfer queue and an interrupt processing unit for controlling data transfer between a plurality of processors
Similar patents to above granted in Japan and EU.
pending Direct Memory Access Engine for Data Transfers
“Creating the World's Largest Reconfigurable Supercomputing System Based on the Scalable SGI® Altix® 4700 System Infrastructure and Benchmarking Life-Science Applications,” ARC '08: Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications (co-author).
“Architecture of the VPP500 Parallel Supercomputer,” Supercomputing '94: Proceedings of the 1994 Conference on Supercomputing (primary author).