<CPSC 418 Invited Lectures

CPSC 418: Invited Lecture Series

This year, we three distinguished lecturers in computer architecture will give lectures in CpSc 418. Here's the schedule. All lectures will be at 8:30 am in room CICSR/CS 208.

March 9, 2000: Mark Hill University of Wisconsin

Multicast Snooping: A New Coherence Method Using a Multicast Address Network

I present a new coherence method called ``multicast snooping'' that dynamically adapts between broadcast snooping and a directory protocol. Multicast snooping is unique because processors predict which caches should snoop each coherence transaction by specifying a multicast ``mask.'' Transactions are delivered with an ordered multicast network, such as an Isotach network, which eliminates the need for acknowledgement messages. Processors handle transactions as they would with a snooping protocol, while a simplified directory operates in parallel to check masks and gracefully handle incorrect ones (e.g., previous owner missing). Preliminary performance numbers with mostly SPLASH-2 benchmarks running on 32 processors show that we can limit multicasts to 2-6 destinations (<< 32) and we can deliver 2-5 multicasts per network cycle (>> broadcast snooping's 1 per cycle). While these results do not include timing, they do provide encouragement that multicast snooping can obtain data directly (like broadcast snooping) but apply to larger systems (like directories).
This talk describes joint work with E.E Biler, R.M. Dickson, Y. Hu, M. Plakal, D.J. Sorin, and D.A> Wood. See the paper by the same title in Proc. ISCA'99, pp. 294-304.

March 16, 2000: Ravi Nair, IBM Thomas J. Watson Laboratories

Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups

Modern processors employ a large amount of hardware to dynamically detect parallelism in single-threaded programs and maintain the sequential semantics implied by these programs. The complexity of some of this hardware diminishes the gains due to parallelism because of longer clock period or increased pipeline latency of the hardware.
We propose a processor implementation which dynamically schedules groups of instructions while executing them on a fast, simple engine and caches them for repeated execution on a fast VLIW-type engine. Our experiments show that scheduling groups spanning several basic blocks and caching these scheduled groups results in significant performance gain over fill buffer approaches for a standard VLIW cache.
This concept, which we call DIF (Dynamic Instruction Formatting) unifies and extends principles underlying several schemes being proposed today to reduce superscalar processor complexity. This paper examines various issues in designing such a processor and presents results of experiments using trace-driven simulation of SPECint95 benchmark programs.
This talk describes joint work with Martin Hopkins. See the paper by the same title in Proc. ISCA'97, pp. 13-25.

March 23, 2000: Neil Wilhelm, SUN Laboratories

Limitations of the Superscalar Approach to Architecture

Return CPSC 418 Home Page