cache.pdf

(70 KB) Pobierz
Cache Designs and Tricks
Craig C. Douglas
University of Kentucky Computer Science Department
Lexington, Kentucky, USA
Yale University Computer Science Department
New Haven, Connecticut, USA
douglas@ccs.uky.edu or douglas-craig@cs.yale.edu
http://www.ccs.uky.edu/~douglas
http://www.mgnet.org
1038948287.024.png
Cache Methodology
Motivation:
1. Time to run code = clock cycles running code +
clock cycles waiting for memory
2. For many years, CPU’s have sped up an average of 72% per year
over memory chip speeds.
Hence, memory access is the bottleneck to computing fast.
Definition of a cache:
1. Dictionary: a safe place to hide or store things.
2. Computer: a level in a memory hierarchy.
1038948287.025.png
Diagrams
Serial:
CPU
Registers
Logic Cache
Main Memory
Parallel:
Shared Memory
. . .
Network
Cache 1
Cache 2
. . .
Cache p
. . .
CPU 1
CPU 2
CPU p
1038948287.026.png 1038948287.027.png 1038948287.001.png 1038948287.002.png 1038948287.003.png 1038948287.004.png 1038948287.005.png 1038948287.006.png 1038948287.007.png 1038948287.008.png 1038948287.009.png 1038948287.010.png 1038948287.011.png 1038948287.012.png 1038948287.013.png 1038948287.014.png 1038948287.015.png 1038948287.016.png 1038948287.017.png 1038948287.018.png 1038948287.019.png 1038948287.020.png 1038948287.021.png
Tuning for Caches
1. Preserve locality.
2. Reduce cache thrashing.
3. Loop blocking when out of cache.
4. Software pipelining.
1038948287.022.png
Memory Banking
This started in the 1960’s with both 2 an d 4 way interleaved memory
banks. Each bank can produce one unit of memory per bank cycle.
Multiple reads and writes are possible in parallel.
The bank cycle time is currently 4-8 times the CPU clock time and getting
worse every year.
Very fast memory (e.g., SRAM) is unaffordable in large quantities.
This is not perfect. Consider a 2 way interleaved memory and a stride 2
algorithm. This is equivalent to non-interleaved memory systems.
1038948287.023.png
Zgłoś jeśli naruszono regulamin