SRAM Design
Memory
Chapter Overview
• Memory Classification
• Memory Architectures
• The Memory Core
• Periphery
• Reliability
Memory
Semiconductor Memory
Classification
RWM NVRWM ROM
Random Non-Random
EPROM Mask-Programmed
Access Access
E2PROM Programmable (PROM)
SRAM FIFO FLASH
DRAM LIFO
Shift Register
CAM
Memory
Memory Architecture: Decoders
M bits M bits
S0 S0
Word 0 Word 0
S1
Word 1 A0 Word 1
S2 Storage Storage
Word 2 Cell A1 Word 2 Cell
N Words
Decoder
AK -1
SN-2
Word N-2 Word N-2
SN_1
Word N-1 Word N-1
Input-Output Input-Output
(M bits) (M bits)
N words => N select signals Decoder reduces # of select signals
Too many select signals K = log2N
Memory
Array-Structured Memory Architecture
Problem: ASPECT RATIO or HEIGHT >> WIDTH
2L-K Bit Line
Storage Cell
AK
Row Decoder
AK+1 Word Line
AL-1
M.2K
Sense Amplifiers / Drivers Amplify swing to
rail-to-rail amplitude
A0
Column Decoder Selects appropriate
AK -1 word
Input-Output
(M bits)
Memory
Hierarchical Memory Architecture
Row
Address
Column
Address
Block
Address
Global Data Bus
Control Block Selector Global
Circuitry Amplifier/Driver
I/O
Advantages:
1. Shorter wires within blocks
2. Block address activates only 1 block => power savings
Memory
Memory Timing: Definitions
Read Cycle
READ
Read Access Read Access Write Cycle
WRITE
Write Access
Data Valid
DATA
Data Written
Memory
Memory Timing: Approaches
MSB LSB
Address
Row Address Column Address
Bus
RAS Address
Address
Bus
Address transition
CAS initiates memory operation
RAS-CAS timing
DRAM Timing SRAM Timing
Multiplexed Adressing Self-timed
Memory
Read-Write Memories (RAM)
• STATIC (SRAM)
Data stored as long as supply is applied
Large (6 transistors/cell)
Fast
Differential
• DYNAMIC (DRAM)
Periodic refresh required
Small (1-3 transistors/cell)
Slower
Single Ended
Memory
6-transistor CMOS SRAM Cell
WL
VDD
M2 M4
Q
Q M6
M5
M1 M3
BL BL
Memory
CMOS SRAM Analysis (Write)
WL
VDD
M4
Q=0 M6
M5 Q=1
M1
VDD
BL = 1 BL = 0
V DD V DD 2 V DD V DD 2
k n, M6 ( V DD – V Tn ) ----------- – ----------
- = k p, M4 ( V DD – V Tp ) ----------- – ----------
- (W/L)n,M6 ≥ 0.33 (W/L)p,M4
2 8 2 8
kn, M5 VDD V DD 2 V DD VDD 2
------------- ----------- – VTn ----------
- = kn M1 ( VDD – V Tn )----------
- – -----------
2 2 2 , 2 8 (W/L)n,M5 ≥ 10 (W/L)n,M1
Memory
CMOS SRAM Analysis (Read)
WL
VDD
BL M4
BL
Q= 0 M6
M5 Q=1
M1 VDD
V DD V DD
Cbit C bit
kn, M5 V VD D 2 VDD V 2
--------------- -----------
DD
- – VTn ------------ = kn, M1 ( VD D – V Tn ) ------------ – -----------
D D
-
2 2 2 2 8
(W/L)n,M5 ≤ 10 (W/L)n,M1 (supercedes read constraint)
Memory
6T-SRAM — Layout
VDD
M2 M4
Q Q
M1 M3
GND
M5 M6 WL
BL BL
Memory
Resistance-load SRAM Cell
WL
VDD
RL RL
Q Q
M3 M4
BL M1 M2 BL
Static power dissipation -- Want RL large
Bit lines precharged to VDD to address t p problem
Memory
Periphery
• Decoders
• Sense Amplifiers
• Input/Output Buffers
• Control / Timing Circuitry
Memory
Row Decoders
Collection of 2M complex logic gates
Organized in regular and dense fashion
(N)AND Decoder
NOR Decoder
Memory
Dynamic Decoders
Precharge devices GND GND
VDD
WL 3
WL 3 VDD
WL 2
WL 2 VDD
WL 1 WL 1
VDD
WL 0 WL 0
V DD φ A0 A0 A1 A1 A0 A0 A1 A1 φ
Dynamic 2-to-4 NOR decoder 2-to-4 MOS dynamic NAND Decoder
Propagation delay is primary concern
Memory
A NAND decoder using 2-input pre-
decoders
WL 1
WL 0
A0A1 A0 A1 A0 A1 A0A 1 A 2A3 A2 A3 A2 A3 A2 A3
A1 A 0 A0 A1 A3 A2 A2 A3
Splitting decoder into two or more logic layers
produces a faster and cheaper implementation
Memory
4 input pass-transistor based column
decoder
BL0 BL1 BL2 BL3
S0
A0 2 input NOR decoder
S1
S2
A1
S3
D
dvantage: speed (tpd does not add to overall memory access time)
only 1 extra transistor in signal path
sadvantage: large transistor count
Memory
4-to-1 tree based column decoder
BL0 BL1 BL2 BL3
A0
A0
A1
A1
D
Number of devices drastically reduced
Delay increases quadratically with # of sections; prohibitive for large decoders
Solutions: buffers
progressive sizing
combination of tree and pass transistor approaches
Memory
Decoder for circular shift-register
VDD VDD VDD VDD VDD VDD
WL0 WL1 WL2
φ φ φ φ φ φ
...
R φ φ R φ φ R φ φ
VDD
Memory
Sense Amplifiers
make ∆V as small
C ⋅ ∆V as possible
t = ----------------
p Iav
large small
Idea: Use Sense Amplifer
small
transition s.a.
input output
Memory
Differential Sensing - SRAM
VDD VDD
V DD PC VDD
y M3 M4 y
x M1 M2 x x x
BL BL
EQ
SE M5 SE
WLi
(b) Doubled-ended Current Mirror Amplifier
VDD
SRAM cell i
y y
Diff.
x Sense x x x
Amp
y y
D D SE
(a) SRAM sensing scheme.
(c) Cross-Coupled Amplifier
Memory
Latch-Based Sense Amplifier
EQ
BL BL
VDD
SE
SE
Initialized in its meta-stable point with EQ
Once adequate voltage gap created, sense amp enabled with SE
Positive feedback quickly forces output to a stable operating point.
Memory
Single-to-Differential Conversion
WL
BL
x Diff. x
+
S.A. _ Vref
cell
y y
How to make good Vref?
Memory
Address Transition Detection
VDD
DELAY
A0 td
ATD ATD
DELAY
A1 td
...
DELAY
AN-1 td
Memory
Reliability and Yield
Memory
Open Bit-line Architecture —Cross Coupling
EQ
WL1 WL0 WLD WLD WL0 WL1
CWBL CWBL
BL BL
Sense
CBL CBL
Amplifier
C C C C C C
Memory
Folded-Bitline Architecture
WL1 WL1 WL0 WL0 WLD WLD
CWBL
BL CBL x y
... Sense
C C C C C C
EQ Amplifier
BL CBL x y
CWBL
Memory
Transposed-Bitline Architecture
BL’
Ccross
BL
SA
BL
BL"
(a) Straightforward bitline routing.
BL’
Ccross
BL
SA
BL
BL"
(b) Transposed bitline architecture.
Memory
Alpha-particles
α-particle
WL
VDD
BL
SiO2
n+
1 particle ~ 1 million carriers
Memory
Yield
Yield curves at different stages of process maturity
(from [Veendrick92])
Memory
Redundancy
Row
Address
Redundant
rows
Fuse
:
Bank
Redundant
columns
Row Decoder
Memory
Array
Column Decoder Column
Address
Memory
Semiconductor Memory Trends
Memory Size as a function of time: x 4 every three years
Memory
Semiconductor Memory Trends
Increasing die size
factor 1.5 per generation
Combined with reducing cell size
factor 2.6 per generation
Memory
Semiconductor Memory Trends
Technology feature size for different SRAM generations
Memory