Vladislav Pomogaev

Engineering Graduate – VE7ZAH

CMOS Digital Bubble Level


My term project for the course ELEC 402: Introduction to VLSI Systems was an open-ended CMOS design of a state machine of our choosing. For my state machine I decided to create a bubble level or “spirit level” that can be used to level things horizontally.

I captured my progress in a series of reports and a short video on my LinkedIn. This post is a more in-depth discussion of my project, what I did, and how I did it.

Objective

My open ended project had these requirements:

  • Come up with an idea for an FSM and implement it in a hardware description language
  • Create an HDL test bench using immediate or concurrent assertions
  • Synthesize into an RTL design using a library with timings
  • Verify and analyze timing by testing the synthesized design with our test bench
  • Use automated layout tools to create a physical design of the FSM
  • Simulate the physical design to verify that the RTL synthesized correctly (directed testing)

In this projects I accomplished these goals and one extra one: I also synthesized and tested the design on an FPGA to verify it’s functionality in the real world.

HDL Design

For my FSM I chose to implement a “bubble level”, also called a “spirit level”. The general idea is that accelerometers/IMUs can be as effective, and possibly even more effective than a bubble inside a glass tube for leveling items. At the very least they are probably more accurate.

Example of a bubble level.

The HDL design started with some rough-draft high-level functional diagrams which were beautified for the reports:

Block diagram of FSM and it’s relation to other modules. The FSM works by indirectly receiving data from the MPU 6050, a triple-axis accelerometer and gyroscope chip. In between the FSM and the MPU 6050 is the I2C interface IP from Efinix; the FPGA vendor that I wanted to use for testing my design. The IP was configured to interface with the GPIO blocks with the appropriate pull-ups. LEDs would be driven by the FSM using fixed lookup logic. For the purposes of the rest of the project, only the level_fsm module would actually be tested and synthesized.
Flowchart diagram of how the FSM. Essentially, the FSM needs to send at least one request to the MPU 6050 to wake the module up. Conveniently, this is usually done by setting the accelerometer scaling factors. Once configured, the FSM simply loops to query the latest acceleration from a single axis. The gyroscope is not used in this project, but could be added later using a Kalman filter or similar to take into account acceleration when moving the device.
Dataflow diagram of the FSM. Like previously stated, first, a wake-up call is sent to the device. Then data is queried by the device in a loop. The device also makes sure that the MPU 6050 is functioning and is responding correctly. If it doesn’t, an error code is displayed with an error LED.

The FSM for this project was written in SystemVerilog, and synthesized and simulated with Icarus Verilog. The IP was generated with the Efinix Efinity IDE.

Description of the FSM module alongside the lookup table which was generated in Excel. Essentially, this is a lookup table that maps a single axis acceleration reading to an angle and gray-code-like output for the LEDs. Changing the sensitivity of the device would require changing this table.
Ports of the FSM.
States of the FSM and their bit-mappings. I use “glitch-free” state machine techniques. One of these techniques is to partition the register that holds the state of the FSM into the bits that define the state and the bits that directly drive glitch free outputs. (First four bits of states are the state bits, the rest are the FSM outputs). This is in contrast to using combinational logic on the state to drive FSM outputs, which can lead to glitches (ex. using XOR gates on the state). A very good write-up about the methodology I used can be found here.

HDL Verification

Diagram of how the FSM was connected to the test-bench. I implemented a directed testing methodology and wrote a test bench with timing-based logic in SVA. The test-bench and FSM were compiled and simulated with Icarus Verilog and visualized with GTKWave, both free tools.

The test-bench logic tests a subset of the functions of the FSM. It tests:

  • Resetting the FSM
  • Sending a request to the I2C block
  • Waiting for responses appropriately
  • Validating responses from the I2C block
  • Outputting to the LEDs
  • Identifying I2C read and write errors

It doesn’t test or permute all of the possible state machine outputs and inputs, but it does a good enough job to test the basic functionalities which is good enough for a class project. For larger systems I would use formal verification techniques instead.

Annotated waveform output from test bench; viewed in GTK Wave. First the FSM is reset and all levels are brought to valid values. Then the basic loop-like validations are done, which test the features as listed above.

After test-bench validation the top-level design was synthesized and the bit-stream was generated and uploaded to the Efinix Xyloni FPGA development board. I set up the circuit on a bread-board by soldering some pin-headers to the dev board and connected some LEDs, as well as the MPU 6050. The design was first run at a very slow clock speed, then the clock was increased to the full 50MHz. There were some errors at this stage, but they all had to do with the configuration of interface and PLL blocks of the FPGA rather than the design. At this point I had a functioning prototype that could be used to level things!

Cadence RTL Synthesis Flow

At this point we have completed the first few steps of the RTL synthesis flow. Next few steps include the actual RTL synthesis, library mapping, more simulation, and timing analysis. (diagram obtained from CMOS VLSI Design: A Circuits and Systems Perspective by Neil Weste and David Harris)

At this point I continued the project by using Cadence to go through the rest of the RTL synthesis flow to continue turning the FSM into a layout. I configured Cadence Encounter to map the FSM (and only the FSM) to the OpenCell library in the 15nm FreePDK technology using a TCL script. The result was a mapping to 170 elements taking up about 67um^2.

Output from Cadence Encounter. level_fsm took up about 67um^2 in the 15nm library.

Timing was also verified with Cadence at this step. For the clock, the timing constraint was set to 200MHz, simulation speed to 100MHz, while the target execution frequency was set to a measly 50MHz. At this technology level these values should be easily achievable. The large margin is so that if someone wants to overclock the design 2x they can (the I2C controller has a default operating frequency of either 100MHz or 50MHz). Timing tests passed in the SS domain.

We see some changes from the previous simulation. Most notably, there is now delay
between the clock and the output of signals from the FSM. These delays are different for different
signals as expected. There are still no delays in the input of signals though because that was part of
the original test bench. We make the assumption that the signals arrive at or before clock change. In
this figure at around 380000ps we see the delay of the command byte relative to the reference clock. I
assume if our signal was to propagate at uneven times through a combinational circuit we would see
that in this simulation. However, all signals are from registers, so we cannot see a difference in the
arrival of different signals on the same bus. Since the frequency we are running this project at is so
low compared to the actual frequency, the delays have little effect on the functionality of the circuit.
The original test bench was run on the newly compiled design. The test bench passed all
assertions as last time due to the fact that all assertions were on the next clock cycle after a state
change. The simulation frequency was adjusted from 1GHz to 100MHz to reflect the more accurate
timing when run at 50MHz on the FPGA. The clock constraint was also adjusted to 200MHz to
ensure positive slack and lots of headroom.
Looking closer at the timing report, we see can see the loads, slews and delays of various mapped components along the longest path. With 4899ps/5000ps of allowable slack, this design is extremely simply relative to the technology level.
Likewise, power was also estimated by Cadence. Doing some rough calculations, a SR626SW coin cell battery could power the FSM part of this device for about (28000uAh * 1.55V nominal = ~43400uWh) / (16.6uW) / 24h/day ~= 100 days

Standard Cell Place and Route Design Flow

Now that we have completed the RTL synthesis flow and verified the design, I moved onto the standard place and route design flow to create the physical layout of the chip.

Standard cell place and route design flow (diagram obtained from CMOS VLSI Design: A Circuits and Systems Perspective by Neil Weste and David Harris). We begin with the netlist from the previous section and continue to place and route.

Now, one minor gotcha for this section. Due to technical reasons the lab course was not able to provide us with the 15nm layout files in time for our project to continue. At this point we had to resort to falling back to the OpenCell library in the 45nm FreePDK technology instead of the planned 15nm. This means that the previous section was redone for the 45nm library before moving on with the place and route. With such a simple design as this it did not effect the performance much.

Layout of the bubble level FSM in the 45nm FreePDK library. The ports were aligned as such: inputs on the left, outputs on the right, and power on the bottom. The very outer dimensions are 33um x 28um, for a total area of 924um^2.
The Verilog file was also translated to a RTL layout and displayed in viewable form. Here we see the entire FSM schematic in terms of logic gates.
The previous RTL design was transformed into a new cell view. This cell view was placed in a test-bench schematic. Each output of the FSM was loaded with a 10fF capacitor. An array of 1V supplies was used to drive the input signals.
Here is a closeup of the test-bench schematic. The voltage input signals were set up manually using an array of input voltage levels that corresponded to the input vectors used in the original Verilog test bench.
The simulation was set up using ADE. The simulation duration was the same length as the original Verilog test bench file.
For comparison, here is the input and output vector of the original Verilog simulation. This simulation tests does the following in this order: 1) FSM reset 2) Write to accelerometer to wake it up 3) Wait for valid response 4) Read from accelerometer register 5) Display level on LEDs. 6) Read from accelerometer register, but this time receive an error 7) Verify error led output 8) Reset FSM 9) Write to accelerometer to wake it up, but this time receive an error 10) Verify error led output 11) Stall forever. This procedure tests an overwhelming majority of the FSM.
Here is the same test bench but now it is executed in Virtuoso. As we can see the waveforms are the same, but we see characteristic RC delays. The multi-bit vector values were the same in both cases. Rise time for input signals was 2ns. The layout passed the test!
I encountered an interesting problem while troubleshooting the simulation. Part of my vectors from the Verilog and Virtuoso simulations were not lining up. Upon an FSM reset, the slave addr, command byte, and num bytes should be going to 0, then back to their default non-zero values after a reset. This happens on the Verilog simulation, but not on the Virtuoso one. However, the rest of the FSM’s functionality would continue as normal. Without those signals though, the FSM would not read/write from the device correctly though. I found that adjusting the length of the reset signal to be slightly MORE than two cycles would solve this issue. However, the root cause of the issue is poor test bench design. It was ultimately a setup-time violation on the reset line that would cause the FSM to skip a state.
What the FSM outputs should look like according to the Verilog test bench. Note how the slave address and other busses change their values at 370ns, but in the Virtuoso they do not.
Extending the reset pulse in Virtuoso solved this bug. Now the simulations are functionally identical.
Another thing I played around with is measuring the propagation delay. I found the worst-case delay to be on the i2c command byte bus. The worst case delay was a staggering 722ps. Since the FSM is rather simple and it’s function is really just to pipe data from the accelerometer to the LEDs, I can be quite confidant that this is in fact the worst-case delay.

Conclusion

For as simple as this project was, it was an exciting introduction to the world of CMOS VLSI. With Moore’s law coming to an end, our “free-lunch” speedups will begin to be constrained by physical limitations. We will be forced to think harder about the algorithms we are running and be forced to design new, more efficient hardware to run them. I predict like Moore’s law, we will see a exponential growth in the quantity of unique hardware designs in the near future.

If you wish to view the source code you can download it here. The top-level folder structure is arranged by progress through the project and includes additional homework handed in as part of the course.