0% found this document useful (0 votes)
40 views20 pages

FPGA Design and Implementation

Uploaded by

qusai11223333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views20 pages

FPGA Design and Implementation

Uploaded by

qusai11223333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

FPGA Design and Implementation

1
ASIC & VLSI
• Time-to-market: Some large ASICs can take a year or
more to design.
• Design Issues: you need a lot of time to handles the
mapping, routing, placement, and timing.
• The FPGA design flow eliminates the complex and time-
consuming floorplanning, place and route, timing
analysis.
CONCEPTUAL FPGA
Interconnect Resources
I/O Cell
Logic Block
FPGA
• Speed (Memory BRAM & Distributed)
• (RAM lost data). // size and cost
• Floating point & Fixed point issue.
• Flex.

08/28/09
Design Entry

Technology Mapping

Design Flow
Placement
Process
Diagram
Routing

Programming Unit
Configured FPGA
Why HDL?
• To allow the designer to implement and verify complex
hardware functionality at a high level, without the
requirement of having to know the details of the low-
level design implementation.

• Advantage:

• FPGAs have lower prototyping costs

• FPGAs have shorter production times


• Synthesis: The process which translates VHDL code
into a complete circuit with logical elements( gates, flip
flops, etc…).
Maximum Throughput Designs
• Dataflow
• Unrolling
• Pipelining
• Merging
Loop Unrolling

• arrays a[i], b[i] and c[i] are mapped to RAMs.


• Rolled Loop: This implementation takes four clock cycles, one multiplier and each RAM can be a
single port.
• Unrolled Loop: The entire loop operation can be performed in a single clock cycle. requires four
multipliers and requires the ability to perform 4 reads and 4 write in the same clock cycle; may
require the arrays be implemented as register arrays rather than RAM.
Loop Merging
Pipelining
• pipelining allows operations to happen
concurrently.
Pipelining
• Function pipelining is only possible as there is no resource contention or data dependency which
prevents pipelining. The input array “m[2]” is implemented with a single-port RAM. The function
cannot be pipelined because the two reads operations on input “m[2]” (“op_Read_m[0]” and
“op_Read_m[1]”) cannot be performed in the same clock cycle.

• Solution: The resource contention problem could be solved by using a dual-port RAM for array
“m[2]", allowing both reads to be performed in the same clock cycle or increasing the the interval
of pipeline
Array Optimizations

08/28/09
Array Optimizations
• Mapping: When there are many small arrays mapping to a single large
array will reduce the storage overhead.
• Partitioning: If each small array gets a separate memory, a lot of memory
space is potentially wasted and the design will be large and consequently
large power consumption.

• Horizontal mapping: this corresponds to creating a new array by


concatenating the original arrays. Physically, this gets implemented as a
single array with more elements.
• Vertical mapping: this corresponds to creating a new array by
concatenating the original words in the array. Physically, this gets
implemented by a single array with a larger bit-width.
Horizontal mapping

08/28/09
Horizontal mapping
• Although horizontal mapping can result in using less RAM
components and hence improve area, it can have an impact on
throughput and performance.

• In the previous example both the accesses to "array1" and "array2"


can be performed in the same clock cycle.

• If both arrays are mapped to the same RAM this will now require a
separate access, and clock cycle, for each read operation.
Vertical mapping
Array Partitioning
• Arrays can also be partitioned into smaller arrays because it has a limited
amount of read ports and write ports which can limit the throughput of a
load/store intensive algorithm.
• The bandwidth can sometimes be improved by splitting up the original array
(a single memory resource) into multiple smaller arrays (multiple memories),
effectively increasing the number of ports.
Array Partitioning
• If the elements of an array are accessed one at a time, an efficient
implementation in hardware is to keep them grouped together and
mapped into a RAM.

• If multiple elements of an array are required simultaneously, it may


be more advantageous for performance to implement them as
individual registers: allowing parallel access to the data.

• Implementing an array of storage elements as individual registers


may help performance but this consume large area and increase
power consumption.
xa7a100tfgg484-2i
2-D for size N =128*128

Input Array Dual port Independent Registers


LUT 1642 10778
FF 835 9548
Power 246 2031

You might also like