Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs

Kizhepatt Vipin, Jan Gray, Nachiket Kapre

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deflection-routed FPGA overlay NoCs such as Hoplite suffer from high worst-case routing latencies due to the penalty of deflections at large system sizes. Segmentation of communication channels in such NoCs can (1) reduce worst-case packet routing latencies for FPGA traffic, (2) enable efficient composition of multi-application NoC workloads, and (3) ease the burden of supporting Partial Reconfiguration (PR) for FPGAs. We use segmentation of the NoC links by inserting isolation multiplexers along NoC links to split traffic into different regions. This segmentation reduces routing latencies by localizing the deflected packets to stay within the segmented region. This can be done either statically using configuration bits that can be changed per application phase ≈1000s of cycles or completely dynamically on a per-cycle basis based on packet addresses. For the Xilinx VC709 FPGA board, we build an 8×8 deflection-routed NoC, with 4×4 statically fracturable regions having 256b-wide links with 6% extra LUT resources and no extra pipelining cost to support fracturing while running at >200 MHz. We comprehensively outperform the CONNECT Torus NoC by 2-3× across various traffic patterns while using 4-7× less FPGA resources. When considering real-world traffic extracted from Sniper simulations of multi-processor PARSEC benchmarks, we observe up to 2.7× improvement in throughput for 8×8 NoC with static segmentation. With fully dynamic segmentation applied to large 30×7 NoC with 300b links, hosting a 1,680-core parallel processor, segmenting the NoC into six 5×7 segments uses an additional 1% of device LUTs but improves throughput by as much as 2.5× for LOCAL traffic.

Original languageEnglish
Title of host publication2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017
EditorsDiana Gohringer, Dirk Stroobandt, Nele Mentens, Marco Santambrogio, Jari Nurmi
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9789090304281
DOIs
Publication statusPublished - Oct 2 2017
Externally publishedYes
Event27th International Conference on Field Programmable Logic and Applications, FPL 2017 - Gent, Belgium
Duration: Sep 4 2017Sep 6 2017

Conference

Conference27th International Conference on Field Programmable Logic and Applications, FPL 2017
CountryBelgium
CityGent
Period9/4/179/6/17

Fingerprint

Field programmable gate arrays (FPGA)
Throughput
Network-on-chip
Chemical analysis
Costs

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Vipin, K., Gray, J., & Kapre, N. (2017). Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs. In D. Gohringer, D. Stroobandt, N. Mentens, M. Santambrogio, & J. Nurmi (Eds.), 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017 [8056777] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/FPL.2017.8056777

Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs. / Vipin, Kizhepatt; Gray, Jan; Kapre, Nachiket.

2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017. ed. / Diana Gohringer; Dirk Stroobandt; Nele Mentens; Marco Santambrogio; Jari Nurmi. Institute of Electrical and Electronics Engineers Inc., 2017. 8056777.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Vipin, K, Gray, J & Kapre, N 2017, Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs. in D Gohringer, D Stroobandt, N Mentens, M Santambrogio & J Nurmi (eds), 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017., 8056777, Institute of Electrical and Electronics Engineers Inc., 27th International Conference on Field Programmable Logic and Applications, FPL 2017, Gent, Belgium, 9/4/17. https://doi.org/10.23919/FPL.2017.8056777
Vipin K, Gray J, Kapre N. Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs. In Gohringer D, Stroobandt D, Mentens N, Santambrogio M, Nurmi J, editors, 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017. Institute of Electrical and Electronics Engineers Inc. 2017. 8056777 https://doi.org/10.23919/FPL.2017.8056777
Vipin, Kizhepatt ; Gray, Jan ; Kapre, Nachiket. / Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs. 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017. editor / Diana Gohringer ; Dirk Stroobandt ; Nele Mentens ; Marco Santambrogio ; Jari Nurmi. Institute of Electrical and Electronics Engineers Inc., 2017.
@inproceedings{6e39ca867ce64ca6878b0ded2fe54c84,
title = "Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs",
abstract = "Deflection-routed FPGA overlay NoCs such as Hoplite suffer from high worst-case routing latencies due to the penalty of deflections at large system sizes. Segmentation of communication channels in such NoCs can (1) reduce worst-case packet routing latencies for FPGA traffic, (2) enable efficient composition of multi-application NoC workloads, and (3) ease the burden of supporting Partial Reconfiguration (PR) for FPGAs. We use segmentation of the NoC links by inserting isolation multiplexers along NoC links to split traffic into different regions. This segmentation reduces routing latencies by localizing the deflected packets to stay within the segmented region. This can be done either statically using configuration bits that can be changed per application phase ≈1000s of cycles or completely dynamically on a per-cycle basis based on packet addresses. For the Xilinx VC709 FPGA board, we build an 8×8 deflection-routed NoC, with 4×4 statically fracturable regions having 256b-wide links with 6{\%} extra LUT resources and no extra pipelining cost to support fracturing while running at >200 MHz. We comprehensively outperform the CONNECT Torus NoC by 2-3× across various traffic patterns while using 4-7× less FPGA resources. When considering real-world traffic extracted from Sniper simulations of multi-processor PARSEC benchmarks, we observe up to 2.7× improvement in throughput for 8×8 NoC with static segmentation. With fully dynamic segmentation applied to large 30×7 NoC with 300b links, hosting a 1,680-core parallel processor, segmenting the NoC into six 5×7 segments uses an additional 1{\%} of device LUTs but improves throughput by as much as 2.5× for LOCAL traffic.",
author = "Kizhepatt Vipin and Jan Gray and Nachiket Kapre",
year = "2017",
month = "10",
day = "2",
doi = "10.23919/FPL.2017.8056777",
language = "English",
editor = "Diana Gohringer and Dirk Stroobandt and Nele Mentens and Marco Santambrogio and Jari Nurmi",
booktitle = "2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs

AU - Vipin, Kizhepatt

AU - Gray, Jan

AU - Kapre, Nachiket

PY - 2017/10/2

Y1 - 2017/10/2

N2 - Deflection-routed FPGA overlay NoCs such as Hoplite suffer from high worst-case routing latencies due to the penalty of deflections at large system sizes. Segmentation of communication channels in such NoCs can (1) reduce worst-case packet routing latencies for FPGA traffic, (2) enable efficient composition of multi-application NoC workloads, and (3) ease the burden of supporting Partial Reconfiguration (PR) for FPGAs. We use segmentation of the NoC links by inserting isolation multiplexers along NoC links to split traffic into different regions. This segmentation reduces routing latencies by localizing the deflected packets to stay within the segmented region. This can be done either statically using configuration bits that can be changed per application phase ≈1000s of cycles or completely dynamically on a per-cycle basis based on packet addresses. For the Xilinx VC709 FPGA board, we build an 8×8 deflection-routed NoC, with 4×4 statically fracturable regions having 256b-wide links with 6% extra LUT resources and no extra pipelining cost to support fracturing while running at >200 MHz. We comprehensively outperform the CONNECT Torus NoC by 2-3× across various traffic patterns while using 4-7× less FPGA resources. When considering real-world traffic extracted from Sniper simulations of multi-processor PARSEC benchmarks, we observe up to 2.7× improvement in throughput for 8×8 NoC with static segmentation. With fully dynamic segmentation applied to large 30×7 NoC with 300b links, hosting a 1,680-core parallel processor, segmenting the NoC into six 5×7 segments uses an additional 1% of device LUTs but improves throughput by as much as 2.5× for LOCAL traffic.

AB - Deflection-routed FPGA overlay NoCs such as Hoplite suffer from high worst-case routing latencies due to the penalty of deflections at large system sizes. Segmentation of communication channels in such NoCs can (1) reduce worst-case packet routing latencies for FPGA traffic, (2) enable efficient composition of multi-application NoC workloads, and (3) ease the burden of supporting Partial Reconfiguration (PR) for FPGAs. We use segmentation of the NoC links by inserting isolation multiplexers along NoC links to split traffic into different regions. This segmentation reduces routing latencies by localizing the deflected packets to stay within the segmented region. This can be done either statically using configuration bits that can be changed per application phase ≈1000s of cycles or completely dynamically on a per-cycle basis based on packet addresses. For the Xilinx VC709 FPGA board, we build an 8×8 deflection-routed NoC, with 4×4 statically fracturable regions having 256b-wide links with 6% extra LUT resources and no extra pipelining cost to support fracturing while running at >200 MHz. We comprehensively outperform the CONNECT Torus NoC by 2-3× across various traffic patterns while using 4-7× less FPGA resources. When considering real-world traffic extracted from Sniper simulations of multi-processor PARSEC benchmarks, we observe up to 2.7× improvement in throughput for 8×8 NoC with static segmentation. With fully dynamic segmentation applied to large 30×7 NoC with 300b links, hosting a 1,680-core parallel processor, segmenting the NoC into six 5×7 segments uses an additional 1% of device LUTs but improves throughput by as much as 2.5× for LOCAL traffic.

UR - http://www.scopus.com/inward/record.url?scp=85034423312&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034423312&partnerID=8YFLogxK

U2 - 10.23919/FPL.2017.8056777

DO - 10.23919/FPL.2017.8056777

M3 - Conference contribution

BT - 2017 27th International Conference on Field Programmable Logic and Applications, FPL 2017

A2 - Gohringer, Diana

A2 - Stroobandt, Dirk

A2 - Mentens, Nele

A2 - Santambrogio, Marco

A2 - Nurmi, Jari

PB - Institute of Electrical and Electronics Engineers Inc.

ER -