Javascript required
Skip to content Skip to sidebar Skip to footer

Computer Organization and Design 4th Edition Solutions Chapter 1

1.1 [edit | edit source]

  1. Personal computer.
  2. Server.
  3. Supercomputer.
  4. Embedded computer.

1.2 [edit | edit source]

Idea from other field Idea from computer architecture
a Performance via Pipelining
b Dependability via Redundancy
c Performance via Prediction
d Make the Common Case Fast
e Hierarchy of Memories
f Performance via Parallelism
g Design for Moore's Law
h Use Abstraction to Simplify Design

1.3 [edit | edit source]

  1. A special kind of program called a compiler reads the high-level source code and translates it into a program in assembly language.
  2. Another program called an assembler transforms the program in assembly language into a program in machine language, which is what a computer understands and can execute directly.

Some compilers "cut the middleman" and produce machine code directly.

1.4 [edit | edit source]

  1. M i n i m u m f r a m e b u f f e r s i z e = b y t e s p i x e l × p i x e l s f r a m e = 3 b y t e s p i x e l × 1280 × 1024 p i x e l s f r a m e = 3932160 b y t e s f r a m e {\displaystyle \mathrm {Minimum\ frame\ buffer\ size} ={\frac {\mathrm {bytes} }{\mathrm {pixel} }}\times {\frac {\mathrm {pixels} }{\mathrm {frame} }}={\frac {3\ \mathrm {bytes} }{\mathrm {pixel} }}\times {\frac {1280\times 1024\ \mathrm {pixels} }{\mathrm {frame} }}=3932160{\frac {\mathrm {bytes} }{\mathrm {frame} }}}
  2. T r a n s m i s s i o n t i m e / f r a m e = F r a m e s i z e T r a n s m i s s i o n r a t e = 3932160 × 8 b i t s 100 × 10 6 b i t s s e c o n d = 0.3145728 s {\displaystyle \mathrm {Transmission\ time/frame} ={\frac {\mathrm {Frame\ size} }{\mathrm {Transmission\ rate} }}={\frac {3932160\times 8\ \mathrm {bits} }{100\times 10^{6}{\frac {\mathrm {bits} }{\mathrm {second} }}}}=0.3145728\ \mathrm {s} }

1.5 [edit | edit source]

a [edit | edit source]

I P S = I n s t r u c t i o n s S e c o n d = I n s t r u c t i o n s C l o c k c y c l e × C l o c k c y c l e s S e c o n d = C l o c k r a t e C P I {\displaystyle \mathrm {IPS} ={\frac {\mathrm {Instructions} }{\mathrm {Second} }}={\frac {\mathrm {Instructions} }{\mathrm {Clock\ cycle} }}\times {\frac {\mathrm {Clock\ cycles} }{\mathrm {Second} }}={\frac {\mathrm {Clock\ rate} }{\mathrm {CPI} }}}

Processor Instructions per second
1 I P S 1 = 3.0 G H z 1.5 C P I = 2 × 10 9 {\displaystyle \mathrm {IPS_{1}} ={\frac {3.0\ \mathrm {GHz} }{1.5\ \mathrm {CPI} }}=2\times 10^{9}}
2 I P S 2 = 2.5 G H z 1.0 C P I = 2.5 × 10 9 {\displaystyle \mathrm {IPS_{2}} ={\frac {2.5\ \mathrm {GHz} }{1.0\ \mathrm {CPI} }}=2.5\times 10^{9}}
3 I P S 3 = 4.0 G H z 2.2 C P I 1.82 × 10 9 {\displaystyle \mathrm {IPS_{3}} ={\frac {4.0\ \mathrm {GHz} }{2.2\ \mathrm {CPI} }}\approx 1.82\times 10^{9}}

Thus, processor 2 has the highest performance in instructions per second.

b [edit | edit source]

Processor Number of cycles Number of instructions
1 ( 3.0 G H z ) × ( 10 s ) = 3 × 10 10 {\displaystyle (3.0\ \mathrm {GHz} )\times (10\ \mathrm {s} )=3\times 10^{10}} ( 2 × 10 9 I P S ) × ( 10 s ) = 2 × 10 10 {\displaystyle (2\times 10^{9}\ \mathrm {IPS} )\times (10\ \mathrm {s} )=2\times 10^{10}}
2 ( 2.5 G H z ) × ( 10 s ) = 2.5 × 10 10 {\displaystyle (2.5\ \mathrm {GHz} )\times (10\ \mathrm {s} )=2.5\times 10^{10}} ( 2.5 × 10 9 I P S ) × ( 10 s ) = 2.5 × 10 10 {\displaystyle (2.5\times 10^{9}\ \mathrm {IPS} )\times (10\ \mathrm {s} )=2.5\times 10^{10}}
3 ( 4.0 G H z ) × ( 10 s ) = 4 × 10 10 {\displaystyle (4.0\ \mathrm {GHz} )\times (10\ \mathrm {s} )=4\times 10^{10}} ( 1.82 × 10 9 I P S ) × ( 10 s ) = 1.82 × 10 10 {\displaystyle (1.82\times 10^{9}\ \mathrm {IPS} )\times (10\ \mathrm {s} )=1.82\times 10^{10}}

c [edit | edit source]

Let I {\displaystyle I} be the number of instructions executed, then a reduction in execution time of 30% can be expressed by the following formula.

E x e c u t i o n t i m e n e w E x e c u t i o n t i m e o l d = I × ( 1.2 × C P I ) × C l o c k c y c l e t i m e n e w I × C P I × C l o c k c y c l e t i m e o l d = 1.2 × C l o c k r a t e o l d C l o c k r a t e n e w = 0.7 {\displaystyle {\frac {\mathrm {Execution\ time_{new}} }{\mathrm {Execution\ time_{old}} }}={\frac {I\times (1.2\times \mathrm {CPI} )\times \mathrm {Clock\ cycle\ time_{new}} }{I\times \mathrm {CPI} \times \mathrm {Clock\ cycle\ time_{old}} }}={\frac {1.2\times \mathrm {Clock\ rate_{old}} }{\mathrm {Clock\ rate_{new}} }}=0.7}

Thus, C l o c k r a t e n e w = 1.2 0.7 × C l o c k r a t e o l d {\displaystyle \mathrm {Clock\ rate_{new}} ={\frac {1.2}{0.7}}\times \mathrm {Clock\ rate_{old}} } . This represents a 71% increase in clock rate.

1.6 [edit | edit source]

In order to find which implementation of the hypothetical Instruction Set Architecture is faster we need to find the execution time of the program under each processor. The execution time of the program can be calculated as follows:

C P U t i m e = C P U c l o c k c y c l e s C l o c k r a t e {\displaystyle \mathrm {CPU\ time} ={\frac {\mathrm {CPU\ clock\ cycles} }{\mathrm {Clock\ rate} }}}

Since we know the clock rates of each processor, we need to find out how many clock cycles it takes each processor to execute the program. This number is given by:

C P U c l o c k c y c l e s = i = 1 n ( C P I i × C i ) {\displaystyle \mathrm {CPU\ clock\ cycles} =\sum _{i=1}^{n}(\mathrm {CPI} _{i}\times C_{i})}

In the above formula, C P I i {\displaystyle \mathrm {CPI} _{i}} and C i {\displaystyle C_{i}} are the CPI and instruction count, respectively, for each instruction class (A, B, C or D). From the problem description, we know that the program executes 10 6 × 10 % = 10 5 {\displaystyle 10^{6}\times 10\%=10^{5}} instructions of class A, 10 6 × 20 % = 2 × 10 5 {\displaystyle 10^{6}\times 20\%=2\times 10^{5}} instructions of class B, 10 6 × 50 % = 5 × 10 5 {\displaystyle 10^{6}\times 50\%=5\times 10^{5}} instructions of class C and 10 6 × 20 % = 2 × 10 5 {\displaystyle 10^{6}\times 20\%=2\times 10^{5}} instructions of class D.

Thus, for processor P1 we have:

C P U c l o c k c y c l e s P 1 = ( 1 × 10 5 ) + ( 2 × 2 × 10 5 ) + ( 3 × 5 × 10 5 ) + ( 3 × 2 × 10 5 ) = 2.6 × 10 6 {\displaystyle \mathrm {CPU\ clock\ cycles_{P1}} =(1\times 10^{5})+(2\times 2\times 10^{5})+(3\times 5\times 10^{5})+(3\times 2\times 10^{5})=2.6\times 10^{6}}

And for processor P2 we have:

C P U c l o c k c y c l e s P 2 = ( 2 × 10 5 ) + ( 2 × 2 × 10 5 ) + ( 2 × 5 × 10 5 ) + ( 2 × 2 × 10 5 ) = 2 × 10 6 {\displaystyle \mathrm {CPU\ clock\ cycles_{P2}} =(2\times 10^{5})+(2\times 2\times 10^{5})+(2\times 5\times 10^{5})+(2\times 2\times 10^{5})=2\times 10^{6}}

Hence, the execution times for each processor are:

C P U t i m e P 1 = 2.6 × 10 6 2.5 × G H z = 1.04 m s {\displaystyle \mathrm {CPU\ time_{P1}} ={\frac {2.6\times 10^{6}}{2.5\times \mathrm {GHz} }}=1.04\ \mathrm {ms} }

C P U t i m e P 2 = 2 × 10 6 3 × G H z = 0.66667 m s {\displaystyle \mathrm {CPU\ time_{P2}} ={\frac {2\times 10^{6}}{3\times \mathrm {GHz} }}=0.66667\ \mathrm {ms} }

Therefore, processor P2 is faster.

a [edit | edit source]

Remembering that CPI refers to the average number of clock cycles per instruction for a program (or program segment), we can find the CPI for each processor by diving the total number of clock cycles needed to execute the program by the number of instructions.

C P I P 1 = C P U c l o c k c y c l e s P 1 N u m b e r o f i n s t r u c t i o n s = 2.6 × 10 6 10 6 = 2.6 {\displaystyle \mathrm {CPI_{P1}} ={\frac {\mathrm {CPU\ clock\ cycles_{P1}} }{\mathrm {Number\ of\ instructions} }}={\frac {2.6\times 10^{6}}{10^{6}}}=2.6}

C P I P 2 = C P U c l o c k c y c l e s P 2 N u m b e r o f i n s t r u c t i o n s = 2 × 10 6 10 6 = 2 {\displaystyle \mathrm {CPI_{P2}} ={\frac {\mathrm {CPU\ clock\ cycles_{P2}} }{\mathrm {Number\ of\ instructions} }}={\frac {2\times 10^{6}}{10^{6}}}=2}

b [edit | edit source]

As calculated before, C P U c l o c k c y c l e s P 1 = 2.6 × 10 6 {\displaystyle \mathrm {CPU\ clock\ cycles_{P1}} =2.6\times 10^{6}} , and C P U c l o c k c y c l e s P 2 = 2 × 10 6 {\displaystyle \mathrm {CPU\ clock\ cycles_{P2}} =2\times 10^{6}} .

1.7 [edit | edit source]

a [edit | edit source]

To calculate the CPI generated by each compiler, we use the formula C P I = C P U c l o c k c y c l e s I n s t r u c t i o n c o u n t {\displaystyle \mathrm {CPI} ={\frac {\mathrm {CPU\ clock\ cycles} }{\mathrm {Instruction\ count} }}} .

C P I A = ( 1.1 s 1 × 10 9 s ) ( 1 1 × 10 9 ) = 1.1 {\displaystyle \mathrm {CPI_{A}} =\left({\frac {1.1\ \mathrm {s} }{1\times 10^{-9}\ \mathrm {s} }}\right)\left({\frac {1}{1\times 10^{9}}}\right)=1.1}

C P I B = ( 1.5 s 1 × 10 9 s ) ( 1 1.2 × 10 9 ) = 1.25 {\displaystyle \mathrm {CPI_{B}} =\left({\frac {1.5\ \mathrm {s} }{1\times 10^{-9}\ \mathrm {s} }}\right)\left({\frac {1}{1.2\times 10^{9}}}\right)=1.25}

b [edit | edit source]

Let's assume processor 1 is running compiler A's code and processor 2 is running compiler B's code. Applying the formula for the execution time of a program we get:

E x e c u t i o n t i m e P 1 = I n s t r u c t i o n c o u n t A × C P I A × C l o c k c y c l e t i m e P 1 {\displaystyle \mathrm {Execution\ time_{P1}} =\mathrm {Instruction\ count_{A}} \times \mathrm {CPI_{A}} \times \mathrm {Clock\ cycle\ time_{P1}} }

E x e c u t i o n t i m e P 2 = I n s t r u c t i o n c o u n t B × C P I B × C l o c k c y c l e t i m e P 2 {\displaystyle \mathrm {Execution\ time_{P2}} =\mathrm {Instruction\ count_{B}} \times \mathrm {CPI_{B}} \times \mathrm {Clock\ cycle\ time_{P2}} }

Since we know the execution times are equal, we can equate both sides and rearrange terms to get the following equation:

C l o c k c y c l e t i m e P 1 C l o c k c y c l e t i m e P 2 = I n s t r u c t i o n c o u n t B × C P I B I n s t r u c t i o n c o u n t A × C P I A = 1.2 × 10 9 × 1.25 1 × 10 9 × 1.1 1.36 {\displaystyle {\frac {\mathrm {Clock\ cycle\ time_{P1}} }{\mathrm {Clock\ cycle\ time_{P2}} }}={\frac {\mathrm {Instruction\ count_{B}} \times \mathrm {CPI_{B}} }{\mathrm {Instruction\ count_{A}} \times \mathrm {CPI_{A}} }}={\frac {1.2\times 10^{9}\times 1.25}{1\times 10^{9}\times 1.1}}\approx 1.36}

Thus, the clock of processor 1 which is running compiler A's code is actually about 36% slower than the clock of processor 2.

c [edit | edit source]

Let C be the new compiler. Then the execution time for compiler C's code will be:

E x e c u t i o n t i m e C = I n s t r u c t i o n c o u n t C × C P I C × C P U c l o c k t i m e {\displaystyle \mathrm {Execution\ time_{C}} =\mathrm {Instruction\ count_{C}} \times \mathrm {CPI_{C}} \times \mathrm {CPU\ clock\ time} } .

The amount by which compiler C's code is faster is given by the ratio of the execution times:

P e r f o r m a n c e C P e r f o r m a n c e A = E x e c u t i o n t i m e A E x e c u t i o n t i m e s C = 1 × 10 9 × 1.1 6 × 10 8 × 1.1 1.67 {\displaystyle {\frac {\mathrm {Performance_{C}} }{\mathrm {Performance_{A}} }}={\frac {\mathrm {Execution\ time_{A}} }{\mathrm {Execution\ times_{C}} }}={\frac {1\times 10^{9}\times 1.1}{6\times 10^{8}\times 1.1}}\approx 1.67}

Thus, compiler C's code is about 1.67 times faster than compiler's A code. Likewise, it is about 2.27 times faster that compiler B's code.

1.8 [edit | edit source]

1.8.1 [edit | edit source]

The text explains that dynamic power is the one that depends on the overall capacitive load of each transistor. However, it only gives proportional formulas. Thus, we will use the following approximation, where P {\displaystyle P} is the dynamic power, C L {\displaystyle C_{L}} is the capacitive load, V {\displaystyle V} is the voltage and f {\displaystyle f} is the switch frequency.

P = C L V 2 f {\displaystyle P=C_{L}V^{2}f}

Rearranging, we have that the average capacitive load for the Pentium 4 Prescott processor is:

C L ( P e n t i u m 4 P r e s c o t t ) = P V 2 f = 90 W ( 1.25 V 2 ) ( 3.6 G H z ) = 90 A V ( 1.25 V 2 ) ( 3.6 × 10 9 s 1 ) = 90 C s 1 V ( 1.25 V 2 ) ( 3.6 × 10 9 s 1 ) = 1.6 × 10 8 C V = 16 n F {\displaystyle C_{L}(\mathrm {Pentium\ 4\ Prescott} )={\frac {P}{V^{2}f}}={\frac {90\mathrm {W} }{(1.25\mathrm {V} ^{2})(3.6\mathrm {GHz} )}}={\frac {90\mathrm {A\cdot V} }{(1.25\mathrm {V} ^{2})(3.6\times 10^{9}\mathrm {s^{-1}} )}}={\frac {90\mathrm {C\cdot s^{-1}} \cdot \mathrm {V} }{(1.25\mathrm {V} ^{2})(3.6\times 10^{9}\mathrm {s^{-1}} )}}=1.6\times 10^{-8}{\frac {\mathrm {C} }{\mathrm {V} }}=16\mathrm {nF} }

Notes on units:

  • A watt can be expressed as the product of current (in amperes) and voltage (V).
  • The ampere (A) is a unit of electrical current, given as a coulomb of charge per second.
  • A Farad (F) is a unit of electrical capacitance, expressed as a coulomb of charge per voltage.

Similarly, the average capacitive load for the Core i5 Ivy Bridge is:

C L ( C o r e i 5 I v y B r i d g e ) = P V 2 f = 40 W ( 0.9 V 2 ) ( 3.4 G H z ) 14.52 n F {\displaystyle C_{L}(\mathrm {Core\ i5\ Ivy\ Bridge} )={\frac {P}{V^{2}f}}={\frac {40\mathrm {W} }{(0.9\mathrm {V} ^{2})(3.4\mathrm {GHz} )}}\approx 14.52\mathrm {nF} }

1.8.2 [edit | edit source]

Processor % of static power Ratio of static to dynamic power
Pentium P4 Prescott 10 W 10 W + 90 W = 10 % {\displaystyle {\frac {10\mathrm {W} }{10\mathrm {W} +90\mathrm {W} }}=10\%} 10 W 90 W = 0. 11 ¯ {\displaystyle {\frac {10\mathrm {W} }{90\mathrm {W} }}=0.{\overline {11}}}
Core i5 Iv Bridge 30 W 30 W + 40 W 42.86 % {\displaystyle {\frac {30\mathrm {W} }{30\mathrm {W} +40\mathrm {W} }}\approx 42.86\%} 30 W 40 W = 0.75 {\displaystyle {\frac {30\mathrm {W} }{40\mathrm {W} }}=0.75}

1.8.3 [edit | edit source]

First, we can consider the total power consumption as the sum of the static and dynamic power components:

P t o t a l = P s t a t i c + P d y n a m i c {\displaystyle \mathrm {P_{total}} =\mathrm {P_{static}} +\mathrm {P_{dynamic}} }

Since static energy consumption is caused by leakage current, we can determine the latter through the following formula, where I L {\displaystyle I_{L}} is the leakage current:

P s t a t i c = V I L {\displaystyle \mathrm {P_{static}} =VI_{L}}

Hence, for the Pentium 4 Presctott I L {\displaystyle I_{L}} is equal to:

I L = P s t a t i c V = 10 W 1.25 V = 8 A {\displaystyle I_{L}={\frac {P_{\mathrm {static} }}{V}}={\frac {10\mathrm {W} }{1.25\mathrm {V} }}=8\mathrm {A} }

We want an overall reduction of 10% in power consumption, which means the reduction must be from both the static and dynamic components. Thus, we need to find the new voltage V n e w {\displaystyle V_{\mathrm {new} }} such that the following equation holds:

P n e w P o l d = V n e w I L n e w + C L ( V n e w ) 2 f 100 W = 0.9 {\displaystyle {\frac {P_{\mathrm {new} }}{P_{\mathrm {old} }}}={\frac {V_{\mathrm {new} }I_{L_{\mathrm {new} }}+C_{L}(V_{\mathrm {new} })^{2}f}{100\mathrm {W} }}=0.9}

This boils down to the following quadratic equation:

( C L f ) V n e w 2 + ( I L n e w ) V n e w 90 W = 0 {\displaystyle (C_{L}f)V_{\mathrm {new} }^{2}+(I_{L_{\mathrm {new} }})V_{\mathrm {new} }-90\mathrm {W} =0}

We calculated the value of the capacitive load C L {\displaystyle C_{L}} in a previous step. For the Pentium 4 Prescott C L = 16 n F {\displaystyle C_{L}=16\mathrm {nF} } . Also since the leakage current is to remain the same, we have all the necessary information to solve the quadratic:

( C L f ) V n e w 2 + ( I L ) V n e w 90 W = ( 16 × 10 9 C V ) ( 3.6 × 10 9 s 1 ) V n e w 2 + ( I L ) V n e w 90 W = ( 57.6 A V ) V n e w 2 + ( 8 A ) V n e w 90 W = 0 {\displaystyle (C_{L}f)V_{\mathrm {new} }^{2}+(I_{L})V_{\mathrm {new} }-90\mathrm {W} =\left(16\times 10^{-9}{\frac {\mathrm {C} }{\mathrm {V} }}\right)\left(3.6\times 10^{9}\mathrm {s} ^{-1}\right)V_{\mathrm {new} }^{2}+(I_{L})V_{\mathrm {new} }-90\mathrm {W} =\left(57.6{\frac {A}{V}}\right)V_{\mathrm {new} }^{2}+(8\mathrm {A} )V_{\mathrm {new} }-90\mathrm {W} =0}

V n e w = 8 A ± ( 8 A ) 2 4 ( 57.6 A V ) ( 90 W ) 2 ( 57.6 A V ) {\displaystyle V_{\mathrm {new} }={\frac {-8\mathrm {A} \pm {\sqrt {(8\mathrm {A} )^{2}-4\left(57.6{\frac {A}{V}}\right)(-90\mathrm {W} )}}}{2\left(57.6{\frac {A}{V}}\right)}}}

Choosing the positive solution of the quadratic equation, we find that V n e w 1.18 V {\displaystyle V_{\mathrm {new} }\approx 1.18\mathrm {V} } , which represents a reduction of about 5.4% over the original 1.25 volts. Following similar operations for the Core i5 Ivy Bridge, we find that V n e w 0.84 V {\displaystyle V_{\mathrm {new} }\approx 0.84\mathrm {V} } , a reduction of ~6.51%.

1.9 [edit | edit source]

1.9.1 [edit | edit source]

We can again use the following formula for the execution time of the program:

E x e c u t i o n t i m e = C l o c k c y c l e s C l o c k r a t e {\displaystyle \mathrm {Execution\ time} ={\frac {\mathrm {Clock\ cycles} }{\mathrm {Clock\ rate} }}}

For one processor, the number of clock cycles required to process the program is given by the summation of the different instruction classes, as explained in the answer to exercise 1.6:

C l o c k c y c l e s 1 = i = 1 n ( C P I i × C i ) = ( 1 ) ( 2.56 × 10 9 ) + ( 12 ) ( 1.28 × 10 9 ) + ( 5 ) ( 2.56 × 10 8 ) = 1.92 × 10 10 {\displaystyle \mathrm {Clock\ cycles_{1}} =\sum _{i=1}^{n}(\mathrm {CPI} _{i}\times C_{i})=(1)(2.56\times 10^{9})+(12)(1.28\times 10^{9})+(5)(2.56\times 10^{8})=1.92\times 10^{10}}

For more than one processor ( p > 1 {\displaystyle p>1} ), the number of cycles is given by:

C l o c k c y c l e s p = ( 1 ) ( 2.56 × 10 9 ) + ( 12 ) ( 1.28 × 10 9 ) 0.7 × p + ( 5 ) ( 2.56 × 10 8 ) {\displaystyle \mathrm {Clock\ cycles_{p}} ={\frac {(1)(2.56\times 10^{9})+(12)(1.28\times 10^{9})}{0.7\times p}}+(5)(2.56\times 10^{8})}

Number of processors 1 2 4 8
Execution time 9.6 s 7.04 s 3.84 s 2.24 s
Relative speed-up over 1 processor 1 1.36 2.5 4.29

1.9.2 [edit | edit source]

If the CPI for the arithmetic operations was doubled, then the new clock cycle counts would be:

C l o c k c y c l e s 1 = ( 2 ) ( 2.56 × 10 9 ) + ( 12 ) ( 1.28 × 10 9 ) + ( 5 ) ( 2.56 × 10 8 ) = 2.176 × 10 10 {\displaystyle \mathrm {Clock\ cycles_{1}} =(2)(2.56\times 10^{9})+(12)(1.28\times 10^{9})+(5)(2.56\times 10^{8})=2.176\times 10^{10}}

C l o c k c y c l e s p = 2.56 × 10 10 0.7 × p + ( 5 ) ( 2.56 × 10 8 ) {\displaystyle \mathrm {Clock\ cycles_{p}} ={\frac {2.56\times 10^{10}}{0.7\times p}}+(5)(2.56\times 10^{8})} , for p > 1 {\displaystyle p>1} .

Number of processors 1 2 4 8
New execution time 10.88 s 7.95 s 4.30 s 2.47 s
Relative slow-down 1.13 1.13 1.12 1.10

1.9.3 [edit | edit source]

Since the clock rates are the same we can compare the number of clock cycles directly. Thus, we need to find a value of C P I l o a d / s t o r e {\displaystyle \mathrm {CPI_{load/store}} } such that the following equation is satisfied:

C l o c k c y c l e s 1 n e w = C l o c k c y c l e s 4 {\displaystyle \mathrm {Clock\ cycles_{1_{new}}} =\mathrm {Clock\ cycles_{4}} }

( 1 ) ( 2.56 × 10 9 ) + ( C P I l o a d / s t o r e ) ( 1.28 × 10 9 ) + ( 5 ) ( 2.56 × 10 8 ) = ( 1 ) ( 2.56 × 10 9 ) + ( 12 ) ( 1.28 × 10 9 ) 0.7 × 4 + ( 5 ) ( 2.56 × 10 8 ) {\displaystyle (1)(2.56\times 10^{9})+(\mathrm {CPI_{load/store}} )(1.28\times 10^{9})+(5)(2.56\times 10^{8})={\frac {(1)(2.56\times 10^{9})+(12)(1.28\times 10^{9})}{0.7\times 4}}+(5)(2.56\times 10^{8})}

Hence, the new value of C P I l o a d / s t o r e {\displaystyle \mathrm {CPI_{load/store}} } should be:

C P I l o a d / s t o r e = 3.84 × 10 9 1.28 × 10 9 = 3 {\displaystyle \mathrm {CPI_{load/store}} ={\frac {3.84\times 10^{9}}{1.28\times 10^{9}}}=3}

1.10 [edit | edit source]

1.10.1 [edit | edit source]

In order to use the yield equation we first obtain the approximate die areas.

D i e a r e a 1 W a f e r a r e a 1 D i e c o u n t 1 = π ( 7.5 c m ) 2 84 = 2.103745 c m 2 {\displaystyle \mathrm {Die\ area} _{1}\approx {\frac {\mathrm {Wafer\ area} _{1}}{\mathrm {Die\ count} _{1}}}={\frac {\pi (7.5\ \mathrm {cm} )^{2}}{84}}=2.103745\ \mathrm {cm} ^{2}}

D i e a r e a 2 π ( 10 c m ) 2 100 = π c m 2 {\displaystyle \mathrm {Die\ area} _{2}\approx {\frac {\pi (10\ \mathrm {cm} )^{2}}{100}}=\pi \ \mathrm {cm} ^{2}}

We can now plug these values into the yield equation:

Y i e l d 1 = 1 ( 1 + D e f e c t r a t e 1 D i e a r e a 1 2 ) 2 = 1 ( 1 + 0.020 ( 0.5 ) ( 2.103745 ) ) 2 = 0.959216 {\displaystyle \mathrm {Yield} _{1}={\frac {1}{\left(1+\mathrm {Defect\ rate} _{1}\cdot {\frac {\mathrm {Die\ area} _{1}}{2}}\right)^{2}}}={\frac {1}{(1+0.020(0.5)(2.103745))^{2}}}=0.959216}

Y i e l d 2 = 1 ( 1 + 0.031 ( 0.5 ) ( π ) ) 2 = 0.909289 {\displaystyle \mathrm {Yield} _{2}={\frac {1}{(1+0.031(0.5)(\pi ))^{2}}}=0.909289}

1.10.2 [edit | edit source]

Since we have the yields, we can apply the formula for cost per die immediately:

C o s t p e r d i e 1 = C o s t p e r w a f e r 1 D i e s p e r w a f e r 1 × Y i e l d 1 = 12 ( 84 ) ( 0.959216 ) = 0.148931 {\displaystyle \mathrm {Cost\ per\ die} _{1}={\frac {\mathrm {Cost\ per\ wafer} _{1}}{\mathrm {Dies\ per\ wafer} _{1}\times \mathrm {Yield} _{1}}}={\frac {12}{(84)(0.959216)}}=0.148931}

C o s t p e r d i e 2 = 15 ( 100 ) ( 0.909289 ) = 0.164964 {\displaystyle \mathrm {Cost\ per\ die} _{2}={\frac {15}{(100)(0.909289)}}=0.164964}

1.10.3 [edit | edit source]

For the first wafer:

D i e a r e a n e w = W a f e r a r e a ( 1.1 ) ( D i e c o u n t ) = 2.103745 1.1 c m 2 = 1.912495 c m 2 {\displaystyle \mathrm {Die\ area_{new}} ={\frac {\mathrm {Wafer\ area} }{(1.1)(\mathrm {Die\ count} )}}={\frac {2.103745}{1.1\ \mathrm {cm} ^{2}}}=1.912495\ \mathrm {cm} ^{2}}

Y i e l d n e w = 1 ( 1 + ( 1.15 ) ( D e f e c t r a t e ) D i e a r e a n e w ( 2 ) ) 2 = 1 ( 1 + ( 1.15 ) ( 0.020 ) 1.912495 ( 2 ) ) 2 = 0.957411 {\displaystyle \mathrm {Yield_{new}} ={\frac {1}{\left(1+(1.15)(\mathrm {Defect\ rate} ){\frac {\mathrm {Die\ area_{new}} }{(2)}}\right)^{2}}}={\frac {1}{\left(1+(1.15)(0.020){\frac {1.912495}{(2)}}\right)^{2}}}=0.957411}

For the second wafer:

D i e a r e a n e w = π 1.1 c m 2 = 2.855993 c m 2 {\displaystyle \mathrm {Die\ area_{new}} ={\frac {\pi }{1.1\ \mathrm {cm} ^{2}}}=2.855993\ \mathrm {cm} ^{2}}

Y i e l d n e w = 1 ( 1 + ( 1.15 ) ( 0.031 ) 2.855993 ( 2 ) ) 2 = 0.905462 {\displaystyle \mathrm {Yield_{new}} ={\frac {1}{\left(1+(1.15)(0.031){\frac {2.855993}{(2)}}\right)^{2}}}=0.905462}

1.10.4 [edit | edit source]

Since the die area is 2 square centimeters, we find that the yield is given by:

Y i e l d = 1 ( 1 + ( D e f e c t r a t e ) 2 c m 2 2 ) 2 = 1 ( 1 + ( D e f e c t r a t e ) ) 2 {\displaystyle \mathrm {Yield} ={\frac {1}{\left(1+\left(\mathrm {Defect\,rate} \right){\frac {2\ \mathrm {cm} ^{2}}{2}}\right)^{2}}}={\frac {1}{\left(1+\left(\mathrm {Defect\,rate} \right)\right)^{2}}}}

Solving for the defect rate we find

D e f e c t r a t e = 1 Y i e l d 1 {\displaystyle \mathrm {Defect\,rate} ={\frac {1}{\sqrt {Yield}}}-1}

Thus, the previous defect rate was

D e f e c t r a t e o l d = 1 0.92 1 = 0.042572 d e f e c t s c m 2 {\displaystyle \mathrm {Defect\,rate_{old}} ={\frac {1}{\sqrt {0.92}}}-1=0.042572\,{\frac {\mathrm {defects} }{\mathrm {cm} ^{2}}}}

And the new one is

D e f e c t r a t n e n e w = 1 0.95 1 = 0.025978 d e f e c t s c m 2 {\displaystyle \mathrm {Defect\,ratne_{new}} ={\frac {1}{\sqrt {0.95}}}-1=0.025978\,{\frac {\mathrm {defects} }{\mathrm {cm} ^{2}}}}

1.11 [edit | edit source]

1.11.1 [edit | edit source]

C P I = C P U c l o c k c y c l e s I n s t r u c t i o n c o u n t = ( 750 s ) ( 3 × 10 9 c y c l e s s 1 ) 2.389 × 10 12 = 0.942759 {\displaystyle \mathrm {CPI} ={\frac {\mathrm {CPU\,clock\,cycles} }{\mathrm {Instruction\,count} }}={\frac {\left(750\mathrm {s} \right)\left(3\times 10^{9}\mathrm {cycles} \cdot \mathrm {s^{-1}} \right)}{2.389\times 10^{12}}}=0.942759}

1.11.2 [edit | edit source]

S P E C r a t i o = R e f e r e n c e t i m e M e a s u r e d t i m e = 9650 s 750 s = 12.866667 {\displaystyle \mathrm {SPECratio={\frac {\mathrm {Reference\,time} }{\mathrm {Measured\,time} }}={\frac {9650\mathrm {s} }{750\mathrm {s} }}=12.866667} }

1.11.3 [edit | edit source]

T i m e n e w = I n s t r u c t i o n c o u n t n e w × C P I × C l o c k c y c l e t i m e = ( 1.1 × I n s t r u c t i o n c o u n t ) × C P I × C l o c k c y c l e t i m e = ( 1.1 ) ( T i m e o l d ) {\displaystyle \mathrm {Time_{new}=Instruction\,count_{new}\times \mathrm {CPI} \times \mathrm {Clock\,cycle\,time} =\left(1.1\times Instruction\,count\right)\times \mathrm {CPI} \times \mathrm {Clock\,cycle\,time} =(1.1)\left(Time_{old}\right)} }

1.11.4 [edit | edit source]

T i m e n e w = I n s t r u c t i o n c o u n t n e w × C P I n e w × C l o c k c y c l e t i m e = ( 1.1 × I n s t r u c t i o n c o u n t ) × ( 1.05 × C P I ) × C l o c k c y c l e t i m e = ( 1.155 ) ( T i m e o l d ) {\displaystyle Time_{new}=Instruction\,count_{new}\times \mathrm {CPI_{new}} \times \mathrm {Clock\,cycle\,time} =\left(1.1\times Instruction\,count\right)\times \left(1.05\times \mathrm {CPI} \right)\times \mathrm {Clock\,cycle\,time} =(1.155)\left(Time_{old}\right)}

1.11.5 [edit | edit source]

S P E C r a t i o n e w = R e f e r e n c e t i m e 1.155 × M e a s u r e d t i m e = 12.866667 1.155 = 11.139394 {\displaystyle \mathrm {SPECratio_{new}={\frac {\mathrm {Reference\,time} }{1.155\mathrm {\times Measured\,time} }}={\frac {12.866667}{1.155}}} =11.139394}

1.11.6 [edit | edit source]

C P I = ( 700 s ) ( 4 × 10 9 c y c l e s s 1 ) ( 0.85 ) ( 2.389 × 10 12 ) = 1.378869 {\displaystyle \mathrm {CPI} ={\frac {\left(700\mathrm {s} \right)\left(4\times 10^{9}\mathrm {cycles} \cdot \mathrm {s^{-1}} \right)}{\left(0.85\right)\left(2.389\times 10^{12}\right)}}=1.378869}

1.11.7 [edit | edit source]

The change in CPI cannot be explained by the increase in clock rate alone. Since the clock rate increased 33% and the number of instructions decreased 15%, we would have expected a reduction in execution time of approximately 1 0.85 1.333333 = 36.25 % {\displaystyle 1-{\frac {0.85}{1.333333}}=36.25\%} , but the execution time only decreased 6.67%. Therefore, the CPI must have increased as well.

1.11.8 [edit | edit source]

C P U t i m e r e d u c t i o n = 750 s 700 s 750 = 6.67 % {\displaystyle \mathrm {CPU\ time\ reduction} ={\frac {750\mathrm {s} -700\mathrm {s} }{750}}=6.67\%} .

1.11.9 [edit | edit source]

I n s t r u c t i o n c o u n t = E x e c u t i o n t i m e × C l o c k r a t e C P I = ( 0.9 ) ( 10 9 ) ( 960 s ) ( 4 × 10 9 c y c l e s s 1 ) ( 1.61 c y c l e s i n s t r u c t i o n ) = 2146.583850931678 {\displaystyle \mathrm {Instruction\,count} =\mathrm {{\frac {Execution\,time\times \mathrm {Clock\,rate} }{\mathrm {CPI} }}={\frac {(0.9)\left(10^{-9})(960{s}\right)\left(4\times 10^{9}\mathrm {cycles\cdot s^{-1}} \right)}{\left(1.61{\frac {\mathrm {cycles} }{\mathrm {instruction} }}\right)}}=2146.583850931678} }

1.11.10 [edit | edit source]

We assume the additional reduction in execution time is over the time obtained in exercise 1.11.9, and thus use those parameters:

C l o c k r a t e = I n s t r u c t i o n c o u n t × C P I E x e c u t i o n t i m e = ( 2.146 E 12 i n s t r u c t i o n s ) ( 1.61 c y c l e s i n s t r u c t i o n ) ( 0.9 ) ( 960 s ) ( 10 9 ) = 3.999 G H z {\displaystyle \mathrm {Clock\,rate} ={\frac {\mathrm {Instruction\,count} \times \mathrm {CPI} }{\mathrm {Execution\,time} }}={\frac {(2.146E12\,\mathrm {instructions} )\left(1.61\,{\frac {\mathrm {cycles} }{\mathrm {instruction} }}\right)}{(0.9)(960\mathrm {s} )(10^{-}9)}}=3.999\,\mathrm {GHz} }

1.11.11 [edit | edit source]

3.823 G H z {\displaystyle 3.823\,\mathrm {GHz} }

1.12 [edit | edit source]

1.13 [edit | edit source]

1.14 [edit | edit source]

clock cycle = ( 50 x 10^ 6 x 1 + 110 x 10^ 6 x 1 + 8 0 x 10^6 x 4 + 16 x 10^6 x 2 ) = 512 x 10^6

execution time = ( 512 x 10^6 ) / (2 x 10^9 ) = 256 x 10 -3 = 0.256 s

If we divide the clock cycle by 2 to make the program twice as fast,

clock cycle / 2 = 256 x 10^ 6 = 50 x 10 ^6 x CPI FP + 11 0 x 10^ 6 x 1 + 8 0 x 10^ 6 x 4 + 16 x 10^ 6 x 2

CPI FP = (256 x 10 ^6 - 462 x 10^ 6 ) / ( 50 x 10^ 6 )

CPI FP can not be improved because negative numbers appear.

execution time improved = execution time * 1/2 = 128 x 10^ -3 = 25 x 10^ -3 + 231 x 10^ -3


CPI L / S (improved) = 0.2

1.14.2

Clock Cycle = 256 x 10^ 6 = 50 x 10^ 6 x 1 + 11 0 x 10^ 6 x 1 + 8 0 x 10^ 6 x 4 x CPI L / S + 16 x 10^ 6 x 2 CPI L / S = (256-192) x 10^ 6 / ( 8 0 x 10^ 6 x 4) CPI L / S = 64 x 10^6 / 80 x 10^6 CPI L / S = 4 / .8 CPI L / S = 5


execution time (improved) = 0.1712s (It is 0.0848s faster than the original execution time (0.256s))

1.15 [edit | edit source]

Computer Organization and Design 4th Edition Solutions Chapter 1

Source: https://en.wikibooks.org/wiki/Solutions_To_Computer_Engineering_Textbooks/Computer_Organization_and_Design:_The_Hardware-Software_Interface_%285th_Edition%29_%289780124077263%29/Chapter_1