Teruo Utsumi has over twenty years experience in computer hardware and is an expert in high performance computer architectures and specializes in the optimization of the hardware implementation of algorithms.  His extensive experience in computer system design spans all levels of system and subsystem architecture, analysis of the interactions between hardware and software, RAS (reliability, availability, and serviceabilility), and testing, as well as logic and physical design of semiconductor chips.

Teruo received his Bachelors of Science degree in Electrical Engineering and Computer Science from the University of California, Berkeley in 1987.

Before starting Mugen Computing, he spent twelve years at SGI in Mountain View and Sunnyvale, CA.  His work there included logic and physical design of ASICs and FPGAs for the SGI Origin and Altix shared memory supercomputers, widely acknowledged to be the most usable large memory computer system in the industry.  Other significant contributions include subsystem architecture and coordination of cross-geography hardware and software teams.

The last position he held at SGI was the role of Technical Lead Engineer for its FPGA-based Reconfigurable Application Specific Computing (RASC) product line. Over the past several years RASC has been widely regarded to be the best high performance reconfigurable computing platform in the industry.  His primary contributions to RASC include the architecture and implementation of several major feature enhancements and coordinating design efforts between the hardware and software teams.  He is known for his innovative hardware architectures, ability to find optimizations for new and existing implementations, and knowledge of how to exploit the various capabilities of FPGAs for various application fields such as DSP.  Considered to be the company expert on FPGA design and all aspects of hardware for reconfigurable computing, he consulted for internal and external organizations on the usage of RASC and the design and optimization of a variety of algorithms.

From 1988 to 1997, he worked at Fujitsu Ltd. in Kawasaki, Japan.  His work there started with what latter became known as Fujitsu’s VPP supercomputer product line, becoming the third person to join the hardware team.  This led to the development of the VPP500 vector parallel supercomputer and the installation of the Numerical Wind Tunnel (NWT) computer at Japan’s National Aerospace Laboratory.  As noted in the supercomputing Top500 list, the NWT held the rank as the world’s fastest supercomputer for two years from 1993 to 1995 and  a top three ranking during 1993 to 1997.  The NWT also received recognition for significant scientific work.  Three applications that ran on the NWT were awarded the Gordon Bell Prize on three occasions, Honorable Mention for Performance in 1994 and First Prize for Performance in 1995 and 1996.

During the initial design phase he worked with a small team to set and refine design parameters for the VPP product line that support key aspects of parallel processing.  He was a major contributor to the development of terminology and concepts that led to the Interprocessor Communication Protocol and numerous patents, including 11 with the U.S. Patent Office.

In the development phase he designed several chips and developed the microarchitecture for the low-end, 32 processor implementation of the crossbar interprocessor network.  For the second generation VPP700 he developed highly innovative implementation and testing schemes for the crossbar network that provided scaling up to 256 processors and reduced its maximum size by a factor of four compared to the network designed for the first generation VPP500.

His professional interests include high performance computing and the advancement of reconfigurable computing into mainstream computing.

Co-inventor of following patents:

Pat. No.    Title
5,896,501    Multiprocessor system and parallel processing method for processing data transferred between processors
5,822,785    Data transfer using local and global address translation and authorization
5,664,104    Transfer processor including a plurality of failure display units wherein a transfer process is prohibited if failure is indicated in a failure display unit
5,652,905    Data processing unit
5,634,071    Synchronous processing method and apparatus for a plurality of processors executing a plurality of programs in parallel
5,625,846    Transfer request queue control system using flags to indicate transfer request queue validity and whether to use round-robin system for dequeuing the corresponding queues
5,623,688    Parallel processing system including instruction processor to execute instructions and transfer processor to transfer data for each user program
5,592,680    Abnormal packet processing system
5,592,628    Data communication system which guarantees at a transmission station the arrival of transmitted data to a receiving station and method thereof
5,572,680    Method and apparatus for processing and transferring data to processor and/or respective virtual processor corresponding to destination logical processor number
5,557,744    Multiprocessor system including a transfer queue and an interrupt processing unit for controlling data transfer between a plurality of processors

Similar patents to above granted in Japan and EU.

pending    Direct Memory Access Engine for Data Transfers

 “Creating the World's Largest Reconfigurable Supercomputing System Based on the Scalable SGI® Altix® 4700 System Infrastructure and Benchmarking Life-Science Applications,” ARC '08: Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications (co-author).

“Architecture of the VPP500 Parallel Supercomputer,” Supercomputing '94: Proceedings of the 1994 Conference on Supercomputing (primary author).