Optimize your database with duplicate data deletion

By James Paris |  No Comments  |  Posted: July 27, 2019
Topics/Categories: EDA - DFM  |  Tags: , , , , , , ,  | Organizations:

Whether you use OASIS or GDSII, unwanted duplicate cells can make their way into the final SoC database. Learn how to remove them.

Today’s system-on-chip (SoC) designs consist of many blocks and libraries, created by many different teams and companies. More often than not, all of these groups are working on their part of the design at the same time. All those design macros from both internal and external sources must be merged together and pass successfully through physical verification before the design can be sent to the foundry.

All that data coming together has created massive, monolithic design databases that have been steadily increasing in size with each new node. Only the introduction of the OASIS compressed format [1] helped forestall (for a time) the need for 1TB+ of disk storage for a single SoC design.

But here’s a dirty little secret — regardless of whether design teams use OASIS or GDSII formats, duplicate cells and cell instances almost always exist in the final SoC database, and they have a definite and negative impact on things like database sizes, application runtimes, and accuracy. Finding these duplications and eliminating those that are negatively impacting the design flow can help design teams optimize their implementation flows and reduce time to tapeout.

Data duplication

First of all, it’s important to understand that not all data duplication is the same. With so many sources using the same cell libraries or intellectual property (IP) blocks, intentional data duplication is frequently used to ensure data integrity, especially during the early stages of the design flow. At the same time, all these data inputs mean that unintentional data duplication also happens, and it can easily escape notice.

Library duplication

While all SoC block owners are responsible for their own design implementation flows, they often share libraries or IP blocks with other teams. During block implementation, the block owners know that nothing can be changed in their blocks without their knowledge. On the other hand, they can (and do) change design data in their blocks without recognizing the impact of that change on teams that share the cell data.

To avoid any unwanted changes during design implementation, designers typically copy their libraries or IP blocks to the shared data using a unique suffix (uniquification). Using these uniquified libraries in the complete SoC implementation avoids the possibility of name collisions with other blocks, as shown in Figure 1.

Figure 1. Cell A is shared between blocks 1-4. To avoid unexpected changes to their version of the cell, block designers uniquify cell A by adding a unique suffix to the cell name (Mentor)

Figure 1. Cell A is shared between blocks 1-4. To avoid unexpected changes to their version of the cell, block designers uniquify cell A by adding a unique suffix to the cell name (Mentor)

At the SoC level, these blocks and their uniquified cell components are independent of each other. When the design is complete, the unique names are no longer needed, but typically stay in the database sent to the foundry. This can result in tens of thousands of duplications in a design database. In addition to increasing the database size, these duplications also make any statistics counting individual cell placement by name incorrect or difficult to interpret. Figure 2 shows a simple example of a design that contains zero instances of the original cell A, but one each of the uniquified blocks 1-4. Reverting these duplications back to a single master reference can significantly reduce database size and complexity.

Figure 2. Uniquified cells in individual blocks can inflate database size at the SoC level (Mentor).

Figure 2. Uniquified cells in individual blocks can inflate database size at the SoC level (Mentor).

Unintentional duplication

Common sources for unintentional cell duplicates are via cells (which have a simple structure) or parameterized cells (pcells) that have the same layout, but different numeric codes following their names.

A duplicate instance occurs when a cell is placed directly on top of another cell instance that shares the same name and cell properties. These duplicate instances are unintentional placements that can be inadvertently created by a number of different sources, such as automated scripts, memory compilers, or data merging flows. Duplicate placements are common in SoC designs, but difficult to detect, since they look the same as a single placement from the layout/mask perspective.

Replacing duplicate uniquified cells

Renaming duplicate uniquified cells with a single reference name quickly clears thousands of duplicates from the typical SoC database. De-uniquifying duplicate cell names is a two-step process:

  • Determine which cells are duplicates of each other
  • Replace each duplicate with a single reference name

In Figure 2 (assuming no edits have been made to any of the cells), all four blocks are compared and identified as duplicates. Designers can then use a database modification tool to change the duplicate references to a common cell reference. In our example, any one of the blocks or the original cell A could be used as the replacement for all the duplicate blocks.

Once all the duplicates are replaced, an optimized database is output and verified against the original layout to ensure that no mask data was changed. This verification is most commonly performed with a layout vs. layout comparison, which focuses on comparing layers, rather than design hierarchies or cell names.

But don’t forget the unintentional data duplications! The process for finding and eliminating these duplications is essentially the same, and uses the same set of tools.

For a practical example, we’ll show you how the duplicate cell replacement flow is supported in the Calibre toolset. The Calibre DBdiff utility is first used to identify and group cell with different names, but the same layout contents. The Calibre DESIGNrev layout editor can then replace each duplicate placement with the user-selected cell name, and generate the optimized database. The layout vs. layout comparison is run with the Calibre nmDRC XOR utility to validate the optimized database.

Finding duplicate cells

Designers use the Calibre DBdiff utility command-line options –’automatch/-multimatch’ to find cells in an input design that contain the same objects, but have different names. While the Calibre DBdiff utility is typically used to compare two different layout databases to each other, it can also be used on a single database by specifying the same input file for the input designs ‘–design’ and ‘–refdesign’, as shown below.

Cell duplication snippet 1

For designs where text or properties on shapes may differentiate cells that are NOT the same, the ‘–comparetext’ and/or ‘–compareproperties’ command line options can be added.

Duplicate cell names are output as sets using the ‘–report’ option. For example, ‘set 1’ below has five cell names that contain the same layout objects. Each numbered set represents a group of cells that have the same contents, but different names (from intentional cell duplications).


The difference sets for both input designs are listed in the report. In this flow, both inputs are the same, so unintentional duplicates are listed later in the report file under the heading ‘GEOMETRICAL EQUIVALENT CELLS IN REFERENCE DESIGN’.

Renaming duplicate cells

After all the intentional and unintentional duplicate names are identified, the designer uses them to create a cell mapping file for the Calibre DESIGNrev filemerge utility. For each set, the designer chooses one cell master name to replace the rest in the set. In our example, the original cell A is not found in the design, so ‘A_block1’ is chosen as the group master.

In ‘set 1’, ‘A_block1’ is the selected master name, and a map cell file is created to replace ‘A_block[2-4]’ with ‘A_block1’:


The map cell list can be simplified if blocks with similar names are reported as having the same layout. If the designer is certain that all the cell names with the same string, but a different substring, are the same, this change can be made using wildcards:

When the map file replacement is completed, the designers runs the Calibre DESIGNrev filemerge utility to implement the changes in the design database.


Note: Cell replacement counts can exceed 200,000 in a full SoC. Focusing duplicate cell replacement at the block level for targeted cells (rather than all duplicates) can ensure faster throughput. In our example, designers can replace all instances of ‘A_block[1-4]’ with the original ‘cell A’ by adding an OASIS file containing ‘cell A’ as another input to the Calibre DESIGNrev filemerge utility.


Database validation

Data validation with the Calibre nmDRC XOR process requires an XOR rule file, which can be generated using the Calibre DBdiff utility based on the unoptimized and optimized database inputs, and a list of the layout base layers (contacts, polys, and diffusions).


The output Calibre SVRF XOR rule decks include the design database specifications and XOR rule for each layer found in the input database. Now the databases can be compared using the Calibre nmDRC XOR process.


Any error results output at this stage could indicate an error in the replacement flow, which should be investigated and addressed. It may also be reasonable to run additional checks like design rule checking (DRC) or layout vs. schematic (LVS) at this point to ensure data integrity.

Removing duplicate instances

Internal or external IP should always be checked to determine if they contain duplicate instances before integrating them into an SoC design. If not, downstream tools operating on duplicate data may miscount cell instances, which can result in unexpected results. When duplications are detected, they should also be traced back to the source, so methodology changes can be made to prevent or reduce future duplications. Typical design databases will contain duplicate instances that number in the thousands, but some can have tens of millions, or even billions, of duplicate placements.

Detecting duplicate instances in a two-dimensional viewer is difficult (Figure 3). Designers can use a layout viewer with manual inspection to confirm the existence of duplicate instances. Scripts can be written to print out and sort the placements for every reference, enabling designers to scan for duplicates. Both techniques are time-consuming and prone to human error, and neither actually removes the duplicate instances.

Figure 3. Duplicated cells placed in a layout on top of one another have no impact to the two-dimensional mask, but can cause issues with some downstream applications (Mentor).

Figure 3. Duplicated cells placed in a layout on top of one another have no impact to the two-dimensional mask, but can cause issues with some downstream applications (Mentor).

Automated solutions that designers can use to find and eliminate these duplicate placements accurately and quickly are available using existing EDA tools.

Going back to our real-world example, designers can use the Calibre nmDRC tool to identify and avoid potential issues with duplicate placements. After completing a Calibre nmDRC run, designers can simply scan the transcript for the ‘ELIMINATING DUPLICATE PLACEMENTS’ header to find any duplicate cells.


Designers can then use the Calibre DESIGNrev layout editor to automatically remove these duplicate placements from the design. Removal may be performed at the block level, or for a full SoC design, using the ‘delete duplicate ref’ command in either batch or graphical mode. To use the graphical mode, the designer opens the layout in the Calibre DESIGNrev layout editor, and executes the following function in the command window:


Writing the duplicate name, parent, and location using the ‘–outfile’ option provides designers the opportunity to track down the source of the duplicates. For example, the end of the report below reports seven duplicate instances of ‘via123’ and ‘via456’ found in ‘block_xyz’.


The duplicate reference summary at the end of the file reports the total number of duplicate instances and arrays found in the design. This database included more than 52 million duplicate instances and over 1000 duplicate arrays.

Running a Calibre nmDRC XOR layout versus layout comparison again after the duplicate instance removal can validate the mask data integrity remains unchanged.


Both intended and unintended data duplication occurs in SoC design databases, leading to increased database sizes, longer application runtimes, and reduced accuracy. Intentional data duplication, such as uniquification of a library, must be carefully managed within a design flow to minimize such duplication in the final tape-out file. Because design teams are often unaware of unintended sources of duplicate data in the design flow, duplicate data and duplicate instance checking should always be performed as part of the IP acceptance criteria. Awareness and management of both intentional and unintentional duplicate data can help design companies manage resources efficiently to optimize their production implementation flows and reduce time to tapeout.


[1] SEMI, “SEMI P39-0416 – Specification for OASIS® – Open Artwork System Interchange Standard,” Vol. Microlithography. Feb 1, 2016. https://ams.semi.org/ebusiness/standards/SEMIStandardDetail.aspx?ProductID=1948&DownloadID=3748

The trade name OASIS is a registered trademark in the USA of Thomas J. Grebinski, Alamo, California and licensed for use exclusively by SEMI.

For more on this topic, download our whitepaper “Optimize Design Databases by Eliminating Duplicate Data”.

About the author

James Paris is a technical marketing engineer with the Design to Silicon division of Mentor, a Siemens business, supporting Calibre design interfaces. Prior to joining Mentor, he was responsible for analog/mixed-signal physical design implementation and flow development for various IC design companies. James holds a B.S. in computer-aided design engineering and an M.B.A in marketing.

Comments are closed.


Synopsys Cadence Design Systems Siemens EDA
View All Sponsors