2015國際積體電路電腦輔助設計軟體製作競賽

Routability-driven Macro Placement

Bauli Yang, Mike Wu, Shihying Liu, MediaTek Inc., Taiwan


Update

2015/07/15

§   Update iccad2015_evaluate.pl

ü  Add third parameter “margin” for expanding die area defined by macro placement. The default value is 0.05

u  Ex. perl iccad2015_evaluate.pl adaptec1.fp adaptec1.aux 0.05 

ü  The input macro placement now shifts bottom left corner  to coordinate (10,10) before evaluation.

§   Update Contest Evaluation Metric

ü  The default margin value is 5%. In other words, P&R must be successfully executed within 1.05 * (DIE AREA)

ü   A 50% penalty to the total area is added for every 5% increase in margin.

ü  Placer now execute with designated target density (urate + 15%) based on input utilization rate.

2015/06/21

§   Update iccad2015_evaluate.pl

ü  Function writeRoute now uses margin 0.05 instead of 0.02 à placer sometimes places cells at the boundary of placeable region which causes routers to crash due to unit conversion from placeable sites to routable bins

ü  The generated [testcase]Eval.route file generates routing space equal to the bounding box of the macro placement * (1+margin)

2015/05/19

§   Update adaptec1_M.zip (add spacing before node name)

§   Modify iccad2015_evaluate.pl

ü  Macro placement bounding box now considers bottom left corner

ü  Include a perl based macro placement checker (python version is still available)


Introduction

In order to reduce product design cycles, reusable Intellectual Property (IP) modules and embedded memories are widely used. Based on this architecture, a modern very large-scale integration (VLSI) chip usually contains hundreds of hard macros and millions of standard cells with millions of connections, as shown in Fig. 1(a). The most popular placement methodology in industry is: (1) macros are placed at legal positions (please see Fig. 1(b)) such that some cost metrics (e.g., routability and wirelength) are optimized; (2) after all macros are fixed, standard cells are placed without overlaps (for macros and standard cells), as shown in Fig. 1(c). Followed by the placement stage, a global router realizes those connections with routing wires and outputs congestion information to evaluate routability of placement results (please see Fig. 1(d)).

 

Fig. 1: Placement methodology with routability evaluation by global routing.


In automatic placement and routing stage, macro placement (Fig. 1(b)) is a key step to obtain better placement and routing quality. Macros are usually placed close to chip boundaries, except for some special designs. Based on macros’ positions, standard cells are placed in the remaining space (usually core region). While macros and standard cells are placed, a global router evaluates routability of placement results by connecting nets among macros and standard cells under limited routing resources. If routing demand is larger than routing resources in some regions, those regions become congested and impact the routability of the placement result.

 



ICCAD 2015 MACRO PLACEMENT BENCHMARK SUITE and UTILITY SCRIPTS

BENCHMARK DESCRIPTION and FORMAT

 

ü  Given a set of unplaced macros, unplaced standard cells, a set of circuit netlist and placeable sites, place the macros within minimal area of bounding box to optimize routability.

ü  Detailed explanation of the benchmark :  Benchmark_Description.pdf

BENCHMARK CIRCUITS

ü  adaptec1_M   (Sample macro placement output : adaptec1.fp)

ü  adaptec2_M   (Sample macro placement output : adaptec2.fp)

ü  adaptec3_M   (Sample macro placement output : adaptec3.fp)

ü  adaptec4_M   (Sample macro placement output : adaptec4.fp)

 

 

CONTEST EVALUATION SCRIPT

ü  Benchmark Evaluation Script : iccad2015_evaluate.pl

§  Usage : iccad2015.pl [circuit.fp] [circuit.aux]

§  Generated Graph Files

1.      Macro Placement Stage                       : [circuit].svg            ( Open with browser -  ex. firefox adaptec1.svg)

2.      Standard Cell Placement Stage           : [circuit].ntu.plt      ( Open with gnuplot - ex. gnuplot adaptec1.ntup.plt)

3.      Global Routing Stage                            : [circuit].Max_H.congestion.plt  ( Open with gnuplot - ex. gnuplot adaptec1.ntup.plt)



EVALUATION and RANKING

STANDARD CELL PLACER FOR CONTEST EVALUATION

ü  NTUplace3

Related Publication : Tung-Chieh Chen, Zhe-Wei Jiang, Tien-ChangHsu, Hsin-Chen Chen and Yao-Wen Chang “NTUplace3: An Analytical Placer for Large-Scale Mixed-Size Designs With Preplaced Blocks and Density Constraints”, In Transaction on Computer-Aided Design, vol 27, no. 7, pp. 1228-1240, July 2008.

GLOBAL ROUTER FOR CONTEST EVALUATION

ü  NCTU-GR 2.0

Related Publication : Wen-Hao Liu, Wei-Chun Kao, Yih-Lang Li, and Kai-Yuan Chao, “Multi-Threaded Collision-Aware Global Routing with Bounded-Length Maze Routing,” In Proc. Design Automation Conference, pp. 200-205, 2010.

EVALUATION METRIC

ü  Contest evaluation flow : evaluation_flow.pdf

ü  Contest evaluation metric : evaluation_metric.pdf

§  Average Congestion Edge (ACE) :

1.      Related Publication : Y. Wei. et al., “GLARE: Global and local wiring aware routability evaluation” In Proc. Design Automation Conference, 2012.

§  Minimum spanning area of the macro placement

§  Execution Time



CONTEST GUIDELINE and SUBMISSION INFORMATION

DELIVERABLES

ü  A binary that generates a macro placement solution for a given benchmark.

ü  The output of the macro placer should be an ASCII text file “[circuit].fp” with location of all macro blocks.

ü  If contestant choose to include third-party tools, the contestant is responsible to wrap all tools in one static binary

SUBMISSION GUIDELINE

ü  Submit one static binary

ü  Machine configuration : TBD

ü  The runtime of each macro placer is limited to 3 hours.



APPENDIX

DETAIL EVALUATION FLOW

ü  The iccad2015_evaluate.pl  will execute the following scripts/binaries

1.      Check macro overlap : check_overlap.py

ü   Usage : python check_overlap.py –f [circuit.fp] –n [circuit.nodes]

2.      Execute ntuplace3 : ntuplace3

ü   Usage : ./ntuplace3 –aux [circuit.aux]

3.      Execute NCTUgr : NCTUgr

ü   Usage : ./NCTUgr ICCAD [circuit.aux] [circuit.ntup.pl] [ICCAD12.set] [circuit.gr]

ü   NCTUgr includes following library files and setting file : 1) POST9.dat  2) POWV9.dat 3) PORT9.dat 4) ICCAD12.NCTUgr.set

4.      Evaluate routability : evaluate_routability.pl

ü   Usage : ./evaluate_routability.pl –p [circuit.aux] [circuit.ntup.pl] [circuit.gr]

ü   Modified from iccad_evaluate_solution.pl (http://cad_contest.cs.nctu.edu.tw/CAD-contest-at-ICCAD2012/problems/p2/p2.html)

ü  Routability driven placement (provided by NTU EDA Lab)

1.      NTUplace4



REFERENCE

MACRO PLACEMENT

ü  Jacky Z. Yan, Natarajan Viswanathan and Chris Chu, “Handling Complexities in Modern Large-Scale Mixed-Size Placement”, DAC 2009Wen-Hao Liu, Wei-Chun Kao, Yih-Lang Li, and Kai-Yuan Chao, “Multi-Threaded Collision-Aware Global Routing with Bounded-Length Maze Routing,” In Proc. Design Automation Conference, pp. 200-205, 2010.

ü  Tung-Chieh Chen, Ping-Hung Yuh and Yao-Wen Chang “MP-Trees: A Packing-Based Macro Placement Algorithm for Modern Mixed-Size Designs”, TCAD 2008

ü  Saurabh N. Adya and Igor L. Markov, “Consistent Placement of Macro-Blocks Using Floorplanning and Standard-Cell Placement”, ISPD 2002

ü  Yi-Fang Chen, Chau-Chin Huang, Chien-Hsiung Chiou, Yao-Wen Chang and Chang-Jen Wang, “Routability-Driven Blockage-Aware Macro Placement”, DAC 2014

BENCHMARK FORMAT

ü  Natarajan Viswanathan, Charles Alpert, Cliff Sze, Zhuo Li, Yaoguang Wei, "ICCAD-2012 CAD Contest in Design Hierarchy Aware Routability-Driven Placement and Benchmark Suite," In Proc. ICCAD, pp. 345-348, 2012.

CONTEST GLOBAL PLACER/GLOBAL ROUTER EVALUATION

ü  Tung-Chieh Chen, Zhe-Wei Jiang, Tien-ChangHsu, Hsin-Chen Chen and Yao-Wen Chang “NTUplace3: An Analytical Placer for Large-Scale Mixed-Size Designs With Preplaced Blocks and Density Constraints”, In Transaction on Computer-Aided Design, vol 27, no. 7, pp. 1228-1240, July 2008.

ü  Wen-Hao Liu, Wei-Chun Kao, Yih-Lang Li, and Kai-Yuan Chao, “Multi-Threaded Collision-Aware Global Routing with Bounded-Length Maze Routing,” In Proc. Design Automation Conference, pp. 200-205, 2010.


Alpha Test

l  Alpha test Top 3

l  Beta Test Top 4

TOP

adaptec1

adaptec2

adaptec3

adaptec4

Final

Ouput

Area

RC

S. Area

Margin

Time

Time_PF

Total

Norm.

Area

RC

S. Area

Margin

Time

Time_PF

Total

Norm.

Area

RC

S. Area

Margin

Time

Time_PF

Total

Norm.

Area

RC

S. Area

Margin

Time

Time_PF

Total

Norm.

1

NTUP3

108232812

109.44

138883996

0.00

67.54

1.98%

141636060

1.00

195152064

123.43

332347660

0.00

102.96

1.74%

338128057

1.00

346684800

105.99

405097214

0.00

95.84

1.83%

412502950

1.00

207277440

115.32

302569947

0.00

166.86

3.37%

312773193

1.05

1.00

NTUP4

108232812

100.93

113421296

0.00

115668802

1.00

195152064

102.36

210213401

0.00

213869563

1.00

346684800

100.53

352239628

0.00

358679055

1.00

207277440

101.59

217909836

0.00

225258179

1.00

1.00

2

NTUP3

124941636

114.88

180725574

0.00

129.52

3.92%

187813413

1.33

249279360

121.34

408853731

0.00

121.9

2.24%

418022138

1.24

352331316

115.22

513237031

0.00

107.04

2.16%

524310094

1.27

19999552

115.69

294113774

0.00

91.21

1.57%

298738132

1.00

1.19

NTUP4

124941636

100.86

128168002

0.00

133194596

1.15

249279360

110.66

328758197

0.00

336130489

1.57

352331316

104.06

395443610

0.00

403975286

1.13

19999552

120.16

320971438

0.00

326018079

1.45

1.32

3

NTUP3

215639904

101.38

224552961

0.00

0.11

-5.00%

213325313

1.51

456781500

109.03

580500870

0.00

0.14

-5.00%

551475827

1.63

800739858

100.00

800739858

0.00

0.19

-5.00%

760702865

1.84

334527876

100.46

339161532

0.00

0.14

-5.00%

322203455

1.08

1.50

NTUP4

215639904

100.00

217322400

0.00

206456280

1.78

456781500

100.00

456781500

0.00

433942425

2.03

800739858

100.00

800739858

0.00

760702865

2.12

334527876

100.00

335309940

0.00

318544443

1.41

1.84

4

NTUP3

326369736

100.00

326369736

0.00

0.3

-5.00%

310051249

2.19

629096832

102.25

680860542

0.00

0.17

-5.00%

646817515

1.91

1256661216

100.00

1256689025

0.00

0.37

-5.00%

1193854574

2.89

527573835

100.02

527960796

0.00

0.37

-5.00%

501562756

1.68

2.14

NTUP4

326369736

100.00

326369736

0.00

310051249

2.68

629096832

100.00

629096832

0.00

597641990

2.79

1256661216

100.00

1256661216

0.00

1193828155

3.33

533576832

100.00

533576832

0.00

506897990

2.25

2.76

 



FAQ

1.     在公布的有些檔案中,書寫的格式並不一樣。像是在adaptec1.nodesadaptec1.pl中,o1o2等前面並沒有空白,但是在adaptec2.nodesadaptec2.pladaptec3.nodesadaptec3.pladaptec4.nodesadaptec4.pl等檔案中o1o2等前面卻空白,想請問這是正常的嗎?

 ANS: Node前面有無空白都是合法表示方法, 但為了方便參賽者專注主題, 將一律統一.node, .pl檔前面都加空白.

 

2.     在你們flow裡面,有使用到 check_overlap.py,可是我們在跑這個tool時會發生錯誤無法使用,想請教是否為檔案的問題? 謝謝。

 ANS: Python用的版本為2.7.3 可以檢查OS是否支援.

 

3.     我在打開了裏頭的 .PL .NODES 發現, 把這兩個檔案的內容總和  大約可區分為三類型:

FIXED

(1)座標 (0,0)  大小給定

MOVABLE :

(2)座標 (0,0)  大小給定

(3)座標給定  大小(0,0)

1的部分,我的理解是將會要我們來設計完成相應的MACRO PLACER

2的話,則交由NTU PLACER進行,但是,在第3類型這裡,我不太明白這樣的點能否交由 NTU PLACER進行移動,還是說這種類型的點,並沒有辦法移動,所以在MACRO PLACER就必須將其列為一點的障礙物,而不能OVERLAP?

ANS: 第三類型可直接忽略 不會對其他node造成overlap.  只需要決定第一類型(標註為FIXED)的座標位置即可. 定義於.pl檔的起始位置無太大參考意義.

 

4.     請問route需要我們自行寫程式產生嗎?

ANS : 不需要, evaluation script會自行產生 [testcase]_Eval.route

 

5.     CONTEST EVALUATION SCRIPT中的程式無法啟動,如下圖所示,請問如何解決?

ANS: 執行shell script, 需要改變存取權限. Ex. chmod 755 iccad2015_evaluate.pl

 

6.     承上題(5.),我們使用了您提到的方法再試一次,可是還是無法執行,如附件,請問該如何解決? 謝謝。

ANS: Please use “perl iccad2015_evalute.pl [circuit.fp] [circuit.aux]”

 

7.     有像下面 net degree 1 的,那這條net就沒有任何node相連囉?

NetDegree : 1   n116279

o211410 I : 0.000000 0.000000

ANS: 是的, 單一nodenet 代表這條net只連這個node   也會出現node完全沒有跟其它node相接的情況 (spare cell) 這些cell都可以在HPWL計算上忽略。

 

8.     像這個 o211431macro node 就是一條自己連向自己的net?

NetDegree : 2   n138150

o211431 O : -131.000000 373.000000

o211431 I : -131.000000 336.000000

ANS: 是的, 代表這個node的兩個pin相接

 

9.     其中 net degree 最大的net n7807,裡面的node 都是 input pin,沒有outputpin,那這條net會由哪一個node output?

ANS: 沒有定義的就沒有, input/output不會影響HPWL的計算。訊號傳遞方向只會對於timing分析才會有影響 netinput/output在這題目不會有所影響。

 

10.            另外想請問 iccad2015_evaluate.pl 最後出的 Macro Placement Area

根據evaluate_routability.pl250行:

$chip_info->{area} = $chip_info->{TRX} * $chip_info->{TRY};

也就是 Macro Placement Bounding Box TRX * TRY,不過Area正確算法應該是 (TRX - BLX) * (TRY - BLY)不是嗎?請問需要修正嗎?

ANS: 需要,更新過後的script會盡快放上。不過原則上還是希望能將macro 往左下角靠攏,比較符合實際的做法。

 

11.            chip boundary是否有長跟寬的限制?

ANS: Chip 並沒有長寬限制,題目希望參賽者能盡量縮小macro placement的擺放面積。Chip的長寬會根據參賽者產生出的 [testcase].fp檔自動產生 [testcase]Eval.scl檔案。.scl檔案中的最小subrow origin會是chip的最左邊點, row裡面的最高點加上row height會是chip的最高點,基本上參賽者只需要產生出.fp, 其餘的檔案將會根據.fp檔自動產生。

 

12.            If I did not submit files in alpha test, may I still submit files for beta test?

ANS: Yes. You are encouraged to submit files for beta test and final test.

 

13.            What is the difference between NTUP3 and NTUP4? and How can we get the detail evaluation flow of NTUP4?

ANS: NTUplace3 is a wirelength/density driven placer, NTUplace4 is a routability driven placer. The overall metric is still based on NTUplace3, NTUplace4 serves as supplementary information.

 

14.            In this web page, http://cad-contest.el.cycu.edu.tw/problem_D/document/Evaluation_Metric.pd  it said that runtime factor is capped at +/-10%, but I found that the runtime factors of the third and the forth team are capped at +/-5% in beta test.

ANS: The runtime factor is capped at 5% for all teams. The reason is because we want to factor less on runtime and more on placement quality.

 

15.            Could you give us your exactly formulation of runtime factor?

ANS: The runtime factor takes the average runtime of all teams, for every 2x speedup/slow down will have +/-2% advantage. The exact formula requires runtime value from all submission. http://cad-contest.el.cycu.edu.tw/problem_D/document/Evaluation_Metric.pdf

 

16.            I found that in the new iccad2015_evaluate.pl script, the parameter "margin" only modify the right bound margin but not top bound margin. Could you please explain the reason of this action?

ANS: This is due to offset error between bin size in global router and row height defined in placer. Using exact margin on router/placer may cause router to crash.