The Fragility of the AI Chip Ecosystem
How a handful of specialized companies have become critical links for the most important technology
As this piece goes to Substack ‘press’, two of the most important companies for Artificial Intelligence (AI), ASML and TSMC have released strong financial earnings. In a business-as-usual quarter, we should expect nothing less and you’ll soon read why.
These two companies, along with a few others in the industry, are not only innovators but have also foundationally enabled Generative AI (GenAI). Each in its own unique way.
One of them is a monopoly. The other has dominance in advanced AI chip fabrication. Their enviable positions are not due to market failure but because they were able to develop specialized technical capabilities that few companies can master. That has given them a solid lead. They have invested billions over the years to get to this stage. So far, they have created tremendous value for their customers and for investors who purchased their stock.
Their industry is a case study in unprecedented innovation driven by scientific creativity, engineering ingenuity and a whole lot of financial investment. But, it has some concentration risks that it needs to address in order to continue to deliver what the world expects from AI.
I hope you enjoy reading this article as much as I did researching and writing it. If you do, would you please help me spread the word by sharing it with others? I would be so grateful.
p.s. Nothing in this article should be construed as investment advice. This post is for information and entertainment purposes only. Please speak to your investment or financial advisor before making any investment decision.
Let’s get into it!
Even AI loves to eat chips!
There are two names that most people seem to be familiar with when it comes to the AI revolution - ChatGPT, an artificial intelligence (AI) innovation that has answers to any question you can conceive of (even if it occasionally hallucinates the perfect response).
And NVIDIA, an AI chip design company known to most as an investment opportunity that several missed until it suddenly started flying above the radar a couple of years ago.
These two entities are symbolic of the AI boom. The boom rests on a foundation that we don’t actively think about but that drives every. single. AI. model → chips! More narrowly, specialized, advanced AI semiconductor chips.
Artificial intelligence, especially GenAI is changing our lives in leaps and bounds. It is driven by underlying foundation and frontier models that have been trained on vast amounts of diverse data to learn and recognize patterns and relationships. GenAI is helping us perform many tasks including text and image generation, writing code, research, and so on.
‘Advanced AI’ semiconductor chips provide the specialized computational power needed for these GenAI models to exist, run and advance. There is no AI without chips!
→ The good news is that these chips are available and are helping us drive innovative solutions for healthcare, education, agriculture and everything in between, now and into the future.
→ → The interesting news is that a remarkably small network of companies is responsible for designing, developing and distributing the advanced AI semiconductor chips that are powering one of the most important technologies that humans have invented.
→ → → The fascinating news is that, of this aforementioned small network, only a tiny subset of companies control critical steps in this advanced AI chip ecosystem. ASML and TSMC are two of these!
Indeed, there are significant areas of fragility and single points of failure in this concentrated ecosystem.
WHAAATTTTT??!!!

My intent here isn’t to be alarmist but to build awareness around the role of key players in a growing industry and how that matters for our shared future.
In fact, there is no immediate need for alarm. Progress is progress. The early steps of innovation are almost always taken on unfirm ground. Then, the industry self-corrects. This one will too. In due course. It has to because we are playing the long game with AI. And everyone, including governments, is putting its might and money into the game.
But, for now, the fragility remains. And it matters because we are rapidly shifting to an AI-driven economy.
More companies are talking about being AI-first, and hiring and retaining only AI-first employees. New startups are building themselves as AI-natives. Still others are itching to innovate but the availability of advanced AI chips is limited.
Demand for better and faster advanced AI chips is increasing and outpacing supply. There’s also talk of an AI bubble……Before we surrender any back-ups, we need more resilience in the ecosystem.
This article will cover in non-technical language,
What are ‘advanced’ AI semiconductor chips
The critical connectors and possible points of failure
Where these connectors fit into the ecosystem
Why fragility matters
Building resilience in the ecosystem and securing our AI future
Commonly-used acronyms
Advanced AI Semiconductor Chips
First, some high-level definitions to bring us all on the same page.
→ Semiconductors, made up of the words ‘semi’ and ‘conductors’ refer to materials that conduct electricity better than insulators (like rubber), but not as well as conductors (like copper). Silicone is the most common material that meets these conditions.
This balance makes semiconductors suitable for controlling electrical current via transistors. Transistors can turn electrical current on and off. Millions of transistors are combined on a single piece of silicone and used to power microprocessors and memory that reside in all modern electronics - computers, phones, home appliances, medical devices, others.
→ → AI semiconductors are a type of semiconductor. They are designed specifically to handle the intricacies of AI algorithms and equipped with higher bandwidth and fast processing power to enable them to tackle huge amounts of data quickly.
→ → → Advanced AI semiconductors are a type of AI semiconductor. Let’s call them A2S for simplicity moving forward in this article. What makes these AI semiconductors ‘advanced’ and suitable for GenAI models are at least these factors -
their node size (7 nm or nanometer and below) compared with 14nm and above for standard AI semiconductors. A node is the process technology feature used to make chips. A smaller number indicates the use of more advanced technology and better performance. For example, smaller transistors means that more of them can fit on a chip, use less energy and the electrons travel shorter distances.
that they require ‘extreme’ ultraviolet lithography (EUV) which is the process of printing circuit patterns onto silicon wafers to create chips. Using EUV lithography, as compared to regular ultraviolet (UV), is akin to using a fine, precision pen instead of a thicker one. Finer is better for similar reasons as with the node.
their superior ability to handle parallel processing and complex math. They also possess specialized high-bandwidth memory (HBM) technology that stacks DRAM chips vertically to move massive volumes of data much faster. Let’s just say that regular memory used in less advanced semiconductors will not perform GenAI tasks quite as well.
**reminder: list of acronyms can be found at the end of this article
A final point on terminology in this section; ‘chips’ is used interchangeably with the word ‘semiconductors’ in the wild, wild world and also in this article.
There are different types of A2S chips such as compute chips, memory chips, networking chips etc. To keep it focused, I am including them collectively under the A2S umbrella. Most key AI tasks need all or multiple of these. So, the collective treatment makes sense for that reason too.
If we were to get thorough here, chips refer to semiconductor-based products. i.e. the complete package of transistors, semiconductors etc. placed on a silicone wafer would be called a chip.
The critical connectors and possible points of failure
As discussed earlier, critical requirements in A2S chips include at least these; 7nm or below node-size, EUV lithography and high-bandwidth memory. (Over time, we are likely to see node sizes getting even smaller plus improvements in EUV machines and HBMs).
Now, this is where it starts to get tricky.
Taiwan Semiconductors (TSMC) makes ~90% of the chips with advanced AI nodes including 7nm, 5nm and 3nm at scale. It is also known to be the most reliable manufacturer for mass production of these nodes. Samsung has some of these capabilities but its reputation for producing quality node processes lags and it isn’t as sought after by customers like NVIDIA and Broadcom.
Bottom line - one company, TSMC, controls the vast majority of advanced AI chip production!
Globally, ~66% of chip processing capacity is based in Taiwan, an island off the coast of China that is constantly under threat of invasion from China. TSMC is now trying to expand production to other regions in the world.

Without EUV lithography, you cannot make advanced AI chips (7nm and below). The Dutch company ASML is the only company that provides extreme ultraviolet (EUV) lithography machines used in manufacturing the most advanced chips.
If you’re imagining a printer-sized machine, think again! Each machine costs $150-400 million, is the size of a bus, weighs as much as a plane, takes over a year to build one, contains 1000s of parts that come from suppliers all over the world and has taken ASML decades and billions to develop.
That’s a lot of innovation and investment for one company in one location in Veldhoven, Netherlands! And no other company does it right now. That’s correct; it’s a monopoly.
Bottom line - Not one other company can manufacture this critical equipment that is required for making every advanced AI chip.

Only three companies create High-Bandwidth Memory (HBM) memory for AI (and we’ve already seen some shortages). These include the market leader SK Hynix, Samsung and a far away third, Micron. These companies supply HBMs to NVIDIA, Microsoft, Google, Meta, AMD and others.
HBM requires huge R&D investment and is difficult to scale. Its manufacturing complexity and technical dependencies make errors very costly. Naturally then, like with EUV, the barriers to entry in this small sub-industry are high.
In 2024, SK Hynix reported that it had sold out product through 2025 because demand had clearly and significantly outpaced supply. To make it more nerve-wracking, both major players SK Hynix and Samsung are located in South Korea, creating a geographical bottleneck.Bottom line - Advanced AI chips must have HBM. There is no alternative technology at this time for A2S.

There are other points of fragility in the system, albeit to a somewhat lesser degree. The fragility is caused by the need for highly specialized, capital and research intensive components or services. As a result, there aren’t too many suppliers for any of these. More often than not, one or two companies dominate the market in each area.
Some of these components and services include advanced packaging techniques for chips, specialty chemicals and inspection equipment. Additional inputs such as ultra-pure silicon wafers, photoresists, and specialty gases are also required and come from just a handful of suppliers, many based in Japan and South Korea.
Where these connectors fit into the ecosystem
Instead of visualizing it as an ecosystem, it is easier to see the bottlenecks when we think of the system as a supply chain. The beginning of this modern A2S supply chain is at the chip design stage and it ends with delivery to the final customer such as OpenAI, Anthropic and others.
The table below shows the main stages of the process, roughly in order. The last row of the table includes a fragility meter which tells you the fragility level at that stage of the supply chain. The fewer the suppliers who can provide critical components or services at that stage, the greater the concentration and higher the fragility.
Once a critical player declares shortages or other disruptions like SK Hynix did, it suspends the process for others along the chain who are reliant on SK Hynix’s components (unless Samsung or Micron can step in quickly). It has the potential to reduce the output of the ecosystem. If a monopoly like ASML has a disruption, those who need its machines have to delay their operations.
Note that this isn’t an exhaustive list of stages or suppliers. The key ones have been included. Several such as the photoresists, specialty gas suppliers and others have been left out to keep the discussion focused on the major areas of fragility and players.
There are many interdependencies, cooperation and competition between the key players across the stages (e.g. TSMC uses ASML’s machines to manufacture A2S chips for AI companies) but also within stages (e.g. Broadcom’s networking chips connect NVIDIA’s GPUs in massive AI clusters. These two companies also compete in custom AI chip design).
The industry is evolving fast as companies like NVIDIA, AMD, and Intel are releasing new AI-optimized chips on annual cycles creating intense competition across the entire supply chain.
On one hand, the A2S ecosystem has achieved remarkable efficiency through specialization.
On the other hand, it makes it vulnerable to any shocks, whether due to geopolitical issues, cyber attacks, warehouse fires or anything else that disrupts operations.
In the time it will take the system to be up and running again, there will be delays in deliveries to the gazillion and increasing customer companies which are impatiently waiting to get chips.
Does fragility matter?
It does.
So far, there haven’t been any major disruptions. While the fallout from temporary disruptions to the ecosystem might not be as problematic, anything longer term would come at a high cost.
Increasingly, AI is being used in national security applications and critical infrastructure such as medicine, financial systems, transportation, communication and energy. The more that AI gets integrated into other industries, the higher the risk if the single points of failure in the AI ecosystem falter or are unable to fulfill orders as quickly.
Given the rate at which AI demand is growing, it is possible that the fragility of the system will get tested in the coming months.
Concentration is already causing problems
As mentioned earlier, SK Hynix has already warned of shortages for HBM through 2025. This means that some of its partners in the ecosystem will have a waiting period before they can continue to build certain types of A2S chips. Customers will also have a waiting period for obtaining their orders of the finished products.
Even outside of this specific case related to SK Hynix, some companies are delaying AI projects due to chip unavailability. Others are paying a premium for chips and placing orders for more chips several months to a year in advance. This has created an artificial competitive advantage for companies that do have access to chips.
It has also stalled innovation for startups that are unable to get chips for their AI models at any price. According to several reports, one of the frustrations among companies building AI frontier models is that they always need more chips than they expect.
A recent article that appeared in ‘The Information’ shared the following about leading Silicon Valley venture capital (VC) firm Andreesen Horowitz’s method to invest in the most promising AI startups,
“This program, called Oxygen, started with plans of an initial cluster of more than 20,000 chips, we previously reported. Now Midha, 33, and his colleagues have expanded its original single cluster to multiple private clusters, which includes groups of chips rented and bought from different cloud providers. Andreessen Horowitz then offers the servers to portfolio companies—alongside cash—in exchange for equity.”

Direct and indirect economic impact
AI is currently driving massive economic growth (notice how the stock market continues to ride high driven primarily by strong numbers from the AI-related tech companies?). The broader AI market is projected to grow to $4.8 trillion by 2033, according to a report by UNCTAD. Every company, big or small is investing in AI transformation. Entire business models are being built on AI capabilities.
If the ecosystem faces more shortages for whatever internal or external reason, it will slow down the economic engine. Companies won’t be able to deploy solutions, the stock market will suffer and startups will shut down.
We saw this a few years ago when the 2021 chip shortage (for non A2S, simpler chips) cost the auto industry alone $210 billion in lost revenue. The AI one will likely results in trillions in lost economic value based on the kind of investment that continues to go into it.

No quick fixes
What makes the concentration and resulting fragility particularly bothersome is that in the short-term, there are few solutions.
Companies cannot change suppliers easily because there are few of those to begin with for specialized components.
There are no substitutes for some of the specialized components. Older chips don’t work as well or at all for advanced AI.
The suppliers cannot scale quickly because new factories take years to get up and running….and heaps of investment dollars.
All of the above being said, we are in the most vulnerable period right now, caught between rapidly increasing demand and supply unable to catch up. Any disruption in the next year or two would be likely to have maximum impact. After then, the industry will self-correct and become more resilient.
Building resilience in the ecosystem
The concentration risks in A2S manufacturing are undeniable. Yet, the industry has continued to function and thrive in spite of the vulnerabilities.
The current chip shortages seem to be temporary. Several factors suggest that the ecosystem may be more resilient than it appears. We won’t know for sure unless it gets tested.
Meanwhile, there are concerted efforts being made, by companies and governments which are acutely aware of the risks, to build more resilience into the system. Due to all these actions, the trend moving forward seems to be toward less concentration.
Self-correcting mechanisms
Rapidly innovating industries experience growing pains but usually self-correct fairly quickly. They are able to because innovation attracts investment which funds the correction. Advanced AI is certainly a good example of this.
The economic incentives align perfectly for every player in this AI chip ecosystem, whether an equipment manufacturer or end customer. Each of them has enormous financial stakes in maintaining stability and expanding capacity. As a case in point, TSMC “increased its expected floor for capacity expansion and upgrades to $40 billion for the full year, up from a previous floor of $38 billion”.
In fact, most of the ecosystem is looking to increase production capacity for AI and spread it out in more places which is one way to self correct. For instance, Intel, Samsung and TSMC are building new fabrication plants in different parts of the world. These and other companies are investing in redundancy, backup systems and disaster recovery to mitigate some of the risk.
Because of their interdependencies and being each other’s suppliers and customers, it creates strong incentives for the industry to cooperate and solve problems, where possible. Not to mention the intense pressure from their investors to upgrade capabilities to keep pace with the AI race.
There is also effort being made by chip customers to reduce dependencies, another form of self correction. Take the latest announcement from OpenAI which inked an agreement with Broadcom to develop up to 10 gigawatts of custom AI chips. OpenAI had been reliant on NVIDIA’s GPU chips for a long time. While not a single point of failure per se, NVIDIA is the most sought after chip design partner.
This move with Broadcom dilutes OpenAI’s dependency on NVIDIA chips as it expands its compute power. Developing a custom chip also shows OpenAI’s attempt to become more self-reliant for chips, following the examples of Amazon and Google which make their own chips for their products.
Overall, various players in the industry seem to be pushing toward creating chip solutions in-house as they seek diversification and control over AI infrastructure. By doing so, they can meet their own needs and manage costs in an industry facing shortages and increasing prices. It also eases the pressure on the chip makers to increase capacity pronto!
Innovation and alternative technologies
Business history is replete with stories of fast-growing industries in which high prices and shortages incentivized new entrants, innovation and the development of alternative technologies. This brings more resilience into the industry.
In the days of yore (1970s), the oil crisis caused by OPEC oil embargos and supply disruptions created new opportunities for oil drilling and production in other parts of the world like Mexico and Northern Europe. That was also when the world started to recognize its folly of over-reliance on fossil fuels and ramped up investment in alternative sources of energy such as solar, wind and nuclear.
Closer to home, in the Dynamic Random Access Memory (DRAM, common type of computer memory) sub-industry in the 1980s, Japanese companies dominated with almost 80% of global market share. When demand started to exceed supply, new entrants like Samsung and Hyundai Electronics from South Korea and others entered the market. This reduced the fragility of that ecosystem.
It is likely that we will see a similar pattern emerge in the A2S industry. It will not be easy to develop an alternative EUV machine or one with a different technology that fits the bill. Building a new advanced fabrication plant costs upward of $20 billion and almost 4 years. Creating EUV capability would take a decade and potentially over $100 billion. That being said, it is not impossible either with the right attention, policy and money.
There isn’t enough room in this article to go on this tangent but business history is also full of examples of innovations that seemed indispensable at one point only to have their throne usurped by the new technology on the block. In a similar vein, AI algorithms are changing. By intention, they are starting to require less computational power. Maybe their latest evolution could use alternative technologies.
One of the emerging technologies for AI are chiplets. Chiplets are small (you can probably tell the size from the cute name) specialized chips that work together to deliver a similar outcome than a traditional A2S chip.
Here’s an illustration of a chiplet created by Claude AI (which I use for research)
Chiplets aren’t as widely used as yet as fully integrated A2S chips but they’re becoming more visible in the market. They provide one possible way around using A2S chips by using a mix of older non A2S and fancier A2S-level components. This reduces their dependency on one supplier and reduces costs, among other such benefits.
Other new technologies for packaging chips are also being developed as I type. Most of the emerging alternative technologies are still….well, emerging…..i.e. under-developed. But, there is hope for them to either fill in the gaps in the current A2S ecosystem or even become the primary architecture.
It’s an unpredictable world that we live in folks, as much as we love to believe otherwise!
Government policies and geopolitical interests
In addition to A2S chip production being in the hands of a few, there is also geographical concentration as I brought up at various points in this article. That also creates fragility, albeit of the geopolitical kind.
Over 60% of chip production takes place on the island of Taiwan. It is outside the scope of this article to get into the China-Taiwan relationship. Suffice to say that it is a tense one for political reasons.
It is also an area of concern for the United States, Europe and other countries which consider semiconductor infrastructure as critical for national security. It is in their best interests to prevent any military friction between China and Taiwan. Ironically, China also consumes chips that are produced in Taiwan while trying to develop its own AI capabilities.
So, the point I am trying to make is that the concentration of sophisticated AI chip manufacturing in Taiwan might actually deter any political aggression from China. I hope I am right. Peace between those two countries translates to uninterrupted A2S chip manufacturing in Taiwan.
To reduce the over-reliance on Taiwanese fabs, governments have introduced policies such as the US CHIPS Act (2022) which “provides funds to support the domestic production of semiconductors and authorizes various programs and activities of the federal science agencies.” This program provides $52.7 billion in funds to develop the American semiconductor industry.
Similarly, the European Chips Act (2023) “will bolster Europe’s competitiveness and resilience in semiconductor technologies and applications, and help achieve both the digital and green transition. It will do this by strengthening Europe’s technological leadership in the field.” This Act provided investment of 43 billion euros toward developing chips technologies.
Acts like these to build resilience in the chip industry might be the first of their kind but will not be the last. Governments are now highly motivated to support public and private investments in AI infrastructure. There are some who believe that these initiatives are too little, too late. But, in this fast moving AI world, any bit helps and better late than never.
Unrelated to the US CHIPS Act but related to US trade policy is the issue of tariffs. As the US government mulls over and acts on imposing tariffs on Taiwan, it may consider giving exemptions to TSMC, a huge sign that it recognizes the outsized impact of one single company on innumerable American businesses.
Conclusion
Ok, so let’s close the loop.
If we were to ignore some of the bad stuff that advancement in AI is sure to bring our way (it already has and there will be more, unfortunately), it has the potential to change our lives in many positive ways.
I cannot think of a better example to drive home that statement than the most recent discovery made by Google’s DeepMind.
Imagine finally and fully figuring out ways to cure cancer, hopefully in the not-too-distant future! AI might actually be able to help.
So, the question isn’t as much whether the ecosystem is fragile. What matters more is whether all the heavy investment and the alignment of economic, political, strategic and business interests can help it manage risks as it gets through its transition to a more resilient system.
Acronyms
CPU = Central Processing Unit (main computer processor)
DRAM = Dynamic Random Access Memory (common type of computer memory)
EDA = Electronic Design Automation (software used to design chips)
EUV = Extreme Ultraviolet (type of lithography light for making advanced chips)
Fab = Fabrication facility (factory where chips are made)
GPU = Graphics Processing Unit (processors originally for graphics, now used for AI)
HBM = High-Bandwidth Memory (specialized fast memory for AI chips)
nm = Nanometer (measurement unit - one billionth of a meter)
TPU = Tensor Processing Unit (Google’s custom AI chips)






Spot on. The true foundational tech dependancy is huge.