Finding YC Startups to Critique with Vector NLP and a Critique of Atopile

Nov 28th 2024

Intro

Over the past year or so I've been trying to start a "business". While looking for a problem that my business can solve, I came across a quote by Paul Graham in his essay, "How to Get Startup Ideas":

The way to get startup ideas is not to try to think of startup ideas. It's to look for problems, preferably problems you have yourself. The very best startup ideas tend to have three things in common: [A] they're something the founders themselves want, [B] that they themselves can build, [C] and that few others realize are worth doing

My problem is that I feel like I don't have any problems that are solvable by a startup that meet points A and C. All of the problems I deeply care about are systemic issues. Global warming, housing crisis (I live in Vancouver), rampant capitalism, etc etc, are all issues that I would love to have solved, but they require far more than my input, and are themselves difficult ways to make money, or at the very least they are a very roundabout way of "starting a business".

Another angle on this is the following insight I have. I'm a deeply technical person. I've only worked at engineering companies with peers that I would describe as very capable, well educated, confident, hardworking, and generally wanting to solve problems, a.k.a. I got very lucky and have only worked at dream companies. This means that, God forbid, there is a "problem" at my workplace, there is a dozen or so engineers just waiting around to fix the issue, or design a solution. If we had a problem and we needed a solution, why should we go use some startup's services? That takes time, the solution might not exactly fit what we need, it won't integrate as well, it's probably going to be expensive and have opaque pricing hidden behind a "contact us" button. Why wouldn't we just fix it ourselves?

So ironically, deeply technical people might not be exposed to those ripe "problems" that Paul is talking about because the problems they have in their local network are already solved. For me, that leaves me with the final problem I have, it's the fact that my job exists! If I could automate it I would.

Systemic Search for Startup Ideas

In this video, YC's Jared Friedman comments that it's possible to sit down and explicitly come up with startup ideas. While the method isn't ideal, I was so stuck that I still wanted to give it a shot. This led me down a road of drawing mind-maps, having circular conversations with ChatGPT, and critiquing YC startups, which is what I'd like to describe now. Here is the proccess:

Look at YC's list of companies. Identify the companies that are working in a problem space that you have experience in. Then, critique the companies as an expert in the field. I ask myself the following questions:

what problem are they claiming to solve? what problem are they actually solving? are they the same?
how frequent is this problem?
is the solution good?
is there an even bigger problem that this startup is *not* solving that you can solve?

Bag-of-words with Cosine Similarity

My general approach is to take a list of my technical skills and sort the YC companies based on how close they are to my list of technical skills. Then use that list to find which companies I should be looking at to get inspiration.

Since YC has funded over 5000 startups, taking a look at all of them and identifying if you are expert enough to critique them is a chore. In comes my favourite NLP tool on planet earth, a bag-of-words vector-space model. This tool is so useful that it's my go-to method for text-based search and text-based clustering. The Wiki article is interesting and I recommend you read it.

"Scrape" the following page by loading it, right clicking, and saving the body as an HTML file. The output is a truncated DOM that is an array of links, with divs scattered everywhere. Inside there's some spans that provide details about the company. You have to scroll down to get the whole page.
Clean up the DOM by converting it into JSON.
Load a sentence-transformers/all-MiniLM-L6-v2 model and produce a vector from the description of every company
Also, produce a vector from a file that lists a collection of skills
Calculate the cosine similarity between every company's description and my own, then order the companies by similarity

The GitHub is here. And yes, this is AI generated code, but debugged by a human of course!

The Unreasonable Effectiveness

Bag-of-words vector techniques work so well that I'm astounded every time.

For example, take a look at the description of DryMerge, a no-code LLM solution to writing scripts:

Automate work with plain English

If you just look at the individual words in this sentence, there's no mention of "AI" or "scripts" or "programming" or anything of the sort. Yet this description fit very close to my predefined skill sets, which includes "AI" and "scripts" and "programming" etc. Why? I speculate it's the "automate work" part. It's possible that the vector embeddings know that "AI" is synonymous with "automate work", so it embeds that in a vector component, which then matches my vector closely.

A Review of Atopile

Autopile is at the top of my list of startups that match my skillset. Their description is:

"We make tools to design electronics circuit boards with code"

Let me describe some of the challenges in PCB design that I think Atopile is trying to address.

The first biggest problem when designing PCBs is to transform your project requirements into a schematic diagram that fulfils said requirements. The requirements could be what you want your gadget/subsystem to do, what inputs and outputs you have, the transformations to electrical state you want to perform. You need to take a good understanding of the requirements and come up with a circuit diagram that solves those requirements. This means taking components and drawing logical/electrical connections between them. The difficulty comes from:

Choosing which components to use, while taking into account manufacturability, supply chain, cost, mechanical requirements, and more... Think of this as a search space in the space of all possible designs, except the decisions you make is also bounded by time, and every decision you make fractals out with a CMOS-like fanout of 4.
Understanding and fulfilling the functional requirements of each component. This means reading datasheets and adding/connecting components in certain ways. You can spend hours designing a PCB that displays video on a monitor, but if you forget a single pullup resistor on the MCU, then your entire project won't work
Understanding vague and confusing datasheets. Sometimes there is missing information and you need to call up the part manufacturer. Sometimes the datasheet does not fully explain the design and you gloss over this fact by not asking yourself the right questions. Other times, you want to use a component in a way that is unconventional. Sometimes, the writer of the datasheet just needs to take a technical writing course.
Preventing simple blunders. I can't find a source for this very easily, but early EDAs for schematic capture had this absolutely maddening bug. Sometimes, nets would not connect even though they look like they should if you connected them in just the wrong way. This has lead to designers adopting very conservative design guidelines. I'm not saying it's unjustified, I'm just saying it's a lot of work to check these things.
The requirements hierarchy in a schematic design is not a tree; it's more like a cyclic graph. Sometimes when you want to make a change to your design you have to change multiple components, which facilitate further changes, and so on...
There's probably more to it than this, but I'll stop for now...

Where I think atopile shines is when it comes to capturing (some) of the electrical requirements and making sure that those requirements are fulfilled. For example, if one of your requirements is to "have" an STM micro on your board, that's pretty easy to add. If that micro itself has a requirement of being powered, it's implied that there needs to be a connection to a power supply. With ato's code, it seems like it's easy to do a "static" check for this.

That being said, the usefulness of this tool will come down to how deep these nested requirements can be checked, and how visible these checks are to the designer.

For example, let's say you have a power supply powering an MCU. The connection between the power supply and MCU is checked to see if it exists. What about the input connection to the power supply? I hope that's checked too. It seems like the language-based descriptions would make it easy to check for these things, either manually, or via an automated linter.

Let's say you have an FPGA on your board. The FPGA has several I/O banks with different voltages. The datasheet specifies that depending on the voltages, the supplies need to be ramped up in a certain order. Presumably, this is because there are diodes between the rails of the chip, but that detail is not specified in the datasheet. Would this tool be able to check that this requirement is satisfied? Probably not! That would require a structured representation of this requirement to check against. While it's possible to store and check for these kinds of things, it's a) not standardized, so every part needs to have a unique requirements specification b) some requirements are difficult to contain in a structured form c) there are always exceptions to the rule, minor caveats that cannot be represented.

What about voltage ripple? Startup currents? Power dissipation?

Consider that some of these datasheets are literally 1000's of pages long. You might think that your PCB is "very simple" and doesn't have these types of parts, but unless you have purely passive components on your board, that's probably not the case. The absolute barebones MCUs have datasheets in the hundreds of pages. How much time are you really saving by defining your schematic in a text based form if you still need to read the entire datasheet, understand it, and check the requirements?

Now, maybe you can use an LLM to understand these requirements directly from the datasheet, which will solve the problem of having the requirements be laid out in a structured form. Would you trust this kind of check is done correctly by the LLM? Would the LLM know to check certain things if it isn't in the training data?

To summarize the schematic capture benefits of atopile: the benefits are limited to the most basic of checks to your design that are not currently well represented in EDA tools.

The second biggest problem with PCB design is transforming your schematic into a PCB layout that is a direct transformation. What I mean by this is the PCB should implement the schematic in the ways that matter.

Most digital designs have some sort of high-speed components on them. Similar to my arguments above, this tool does not address checking their requirements. High-speed design? Impedance checks? There are the problems that take the most time out of a designer and are the most likely to be screwed up.

Now, the one benefit I see of this tool is that it contains sample layouts for groups of components. I think this is the real winner in this tool. I start every PCB layout by grouping related components, performing a "mini-layout" on said components, while taking into account thermal dissipation and mechanical requirements. Usually these components are already grouped on the schematic page in a hiarchical sheet. Only after these mini-layouts are done do I go and group these modules together.

Most of the time I place related components on a single side of the board anyways to begin with. I move some components to the other side only if I need to.

Atopile seems to have a database of these "modules" already pre-laid out, which could save time.

But... let me be really cynical here. How much of a benefit is this really to a designer or layout engineer?

All but the most simple of board designs of mine have gone through more than one iteration. Every single board I ever made that ended up in a product has had multiple iterations. That means that after the initial grunt-work of Rev 1.0, the rest of the layout changes are minor and don't require these "mini-groupings".

Likewise, most of the Rev 1.0 boards I worked on have themselves been derivations of previous designs/other boards. We simply took the project files, made a copy, changed the name of the project, ripped up 50% of it, and re-made the design.

The most difficult layout issues have been the aforementioned high-speed traces, which congregate around DDR, memory, LVDS, and high-speed busses, which means they are concentrated around the MCU. There is massive benefit to reusing the layout around the MCU from previous projects because that board has already been validated. You know for certainty that the transformation from circuit to PCB was correct before, so your chances of performing the transformation correctly again go up.

So to me, the benefits of this minor layout tool are diminished the more reuse exists in a PCB design.

Where can Atopile do better?

CI integration is akin to checklists in conventional PCB design. Companies tend to produce checklists from past failures and the known failures to the engineers working at the company. You don't want to be one of those individuals who's name is memorialized in a checklist. Trust me, I would know.

There tends to be a "big" checklist for all designs. This checklist is kept in mind by the designer during schematic/PCB design, and then reviewed at critical milestones. Usually it's stored as some sort of text file, web form, or spreadsheet.

There are several annoyances with this checklist system:

The "big" checklist is sometimes very big. It contains lots of items that are possibly not applicable to the current design.
The application of the "big" checklist tends to be done many times over by the designer, often implicitly, informally, and redundantly. For example, if you modify one part of the circuit, do you need to go re-check other parts of the circuit? It really depends on what it is. If you have a path in the hypothetical "dependency graph", maybe you should.
There's lots of things that you can check off once, and not have to check them again unless you want someone to double-check your work
This checklist applies to both the PCB layout and the schematic, and changing either one may cause checks in one to become invalid in the other.
The checklist varies between companies, but should it? The checklist should really vary by the project you are doing. If a company is new and doesn't have a checklist in place, or the checklist is quite narrow, maybe there should be some way to easily get a much bigger checklist?
Some items on checklists can be checked with a script. Others are checked by looking at a particular area of a schematic. Others, an area on the PCB. If you write a script to check something, how can you check the script? If you do a manual check, how can you be sure someone else doesn't change the design and invalidate your check?
Let's say a design comes back and it fails for some reason. The business would want to not repeat the mistake again. How can you have traceability in this system? It seems like the only traceability is in the minds of the workers who come, go, or get laid off.

I can think of a lot of different ways of solving these problems. Some of which can be solved via existing tools.

However, I think that Atopile is uniquely positioned to solve this checklisting problem. It's not perfect, but having a dependency graph you can derive from your ato code would help to solve dependencies. By importing modules in a way akin to Python packages you can store which parts of a checklists are applicable. Having integrations into KiCad would help identify which checks are invalidated when changes are made (PCB or schem). Scripts can be used to automate some checks. CI and version control? I propose 'git blame'.