































## L2 Cache

•Designed for high bandwidth and low power

🏶 Sun

17

•4 way banked 3MB L2 cache

•12 way set associative to handle 32 threads

•Data is 64B interleaved across banks

•64B block size

•Multiple outstanding miss handling

•Writeback protocol





## L2 Cache – Directory

•Reverse mapped directory implemented in L2

- Tracking done by L1 line instead of L2 line by shadowing the L1 tags
- Ratio of no of lines of L1 to lines of L2 make this more efficient, even though it is a cam structure
- Indices are arranged so that stores cam against 1/16<sup>th</sup> of the directory making this very power efficient



## Conclusions

•Commercial Server Workloads exhibit high degrees of Thread Level Parallelism, but poor locality of reference

•32 threads are implemented on a single CPU to hide memory and pipeline stalls and maximize parallel memory accesses

•A high bandwidth memory subsystem with shared cache, services memory references

•Sharing of resources at all levels leads to a very area and power efficient design

🏶 Sun

