Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Adaptive Techniques for Dynamic Processor Optimization_Theory and Practice Episode 1 Part 8 ppsx
Nội dung xem thử
Mô tả chi tiết
Chapter 6 Dynamic Voltage Scaling with the XScale Embedded Microprocessor 129
As a concrete example, assume the processor is running at 1 GHz and
VDD = 1.75 V. If half of the cycles are stalls waiting for the bus, as
determined by a combination of the total clock count, instructions
executed, and data dependency stall or bus request counts, the VDD can be
adjusted to 1.2 V (see Figure 6.2) and the core frequency reduced to 500
MHz. Useful work is then performed in a greater number of the (fewer
overall) core clock cycles. Referring to Figure 6.2, the power savings is
nearly 50% with the same work finished in the same amount of time.
6.2 Dynamic Voltage Scaling on the XScale Microprocessor
This section describes experimental results running DVS on the 180 nm
XScale microprocessor. The value of DVS is evident in Figure 6.3. Here,
the 80200 microprocessor is shown functioning across a power range from
10 mW in idle mode, up to 1.5 W at 1 GHz clock frequency. The idle
mode power is dominated by the PLL and clock generation unit. The
processor core includes the capacity to apply reverse-body bias and supply
collapse [10, 11] to the core transistors for fully state-retentive powerdown. The microprocessor core consumes 100 μW in the low standby
“Drowsy” mode [12]. The PLL and clock divider unit must be restarted
when leaving Drowsy mode. When running with a clock frequency of 200
MHz, the VDD can be reduced to 700 mV, providing power dissipation less
than 45 mW.
Figure 6.3 The value of dynamic voltage scaling is evident from this plot of the
80200 power and VDD voltage over time. The power lags due to the latency of the
measurement and time averaging.
4IMEARBITRARYSCALE
6$$6
#LOCK&REQUENCY-(Z 0OWERM7
&REQUENCY
0OWER
AVESAMPLES 6OLTAGE
130 Lawrence T. Clark, Franco Ricci, William E. Brown
6.2.1 Running DVS
To demonstrate DVS on the XScale, a synthetic benchmark programmed
using the LRH demonstration board is used here. The onboard voltage
regulator is bypassed, and a daughter-card using a Lattice GAL22v10 PLD
controller and a Maxim MAX1855 DC-DC converter evaluation kit is
added. The DC–DC converter output voltage can vary from 0.6 to 1.75 V.
The control is memory mapped, allowing software to control the processor
core VDD.
The synthetic benchmark loops between a basic block of code that has a
data set that fits entirely in the cache (these pages are configured for writeback mode) and one that is non-cacheable and non-bufferable. The latter
requires many more bus operations, since the bus frequency of 100 MHz is
lower than the core clock frequency, which must be at least 3× the bus
frequency on the demonstration board.
The code monitors the actual operational CPI using the processor PMU.
The number of executed instructions as well as the number of clocks, since
the PMU was initialized and counting began, are monitored. The C code,
with inline assembly code to perform low-level functions is
unsigned int count0, count1, count2;
int cpi() {
int val;
// read the performance counters
asm("mrc p14, 0, r0, c0, c0, 0":::"r0"); // read the PMNC register
asm("bic r1, r0, #1":::"r1"); // clear the enable bit
asm("mcr p14, 0, r1, c0, c0, 0":::"r1"); // clear interrupt flag, disable counting
// read CCNT register
asm("mrc p14, 0, %0, c1, c0, 0" : "=r" (count0) : "0" (count0));
asm("mrc p14, 0, %0, c2, c0, 0" : "=r" (count1) : "0" (count1));
asm("mrc p14, 0, %0, c3, c0, 0" : "=r" (count2) : "0" (count2));
return(val = count0);
}
int startcounters() {
unsigned int z;
// set up and turn on the performance counters
z = 0×00710707;
asm("mov r1, %0" :: "r" (z) : "r1"); // initialization value in reg. 1
asm("mcr p14, 0, r1, c0, c0, 0" ::: "r1"); // write reg. 1 to PMNC
}