Date: 6th Oct, 2022
Lately, Twitter has been in the limelight for a variety of reasons that no one can say for sure, good or bad! The popular social media company is going through an upheaval which has caused worker layoffs, ignited discussions around potentially bloated engineering practices, and raised eyebrows about the possibility of a complete application failure.
There have been profound discussions around whether the application will stop dead in its tracks because of all the people leaving the company. The platform has faced minor outages before and recovered from them. However, as reported by Down Detector, Twitter experienced 2,138 outages on December 11th this year.
Additionally, 63% of Twitter users stated the app is not loading new tweets, while 36% reported problems with the Twitter website. December 11th was the second day of a major Twitter outage in less than a month after the November 4th incident. On this day, the microblogging platform was down for a couple of hours. Luckily, the damage was restricted to its desktop version. But does that mean the application is doomed or is this just par for the course for such a massive application? And in either case, is there something for the world of application performance professionals to see here?
From the outside, we can learn plenty of valuable lessons from the Twitter application. So, let’s get on with it.
Why Does Twitter Require a Complex Infrastructure?
For the fiscal year 2021, Twitter reported a figure of $1.8 billion as the cost of revenue generation. The Twitter infrastructure constitutes a major percentage of this cost. It’s not a surprise that Elon Musk has instructed his engineering team to cut the infrastructure budget by $1 billion.
What do we know about Twitter’s complex infrastructure? Before the advent of the cloud, Twitter (like most other companies in the mid-2000s) ran their application on self-managed data centres. In 2018, it migrated a major portion of its data infrastructure to Google Cloud, followed by a multi-year contract with Amazon Web Services in 2020 to run real-time tweets on cloud servers.
The Twitter analytics platform is an important component necessary for mining its Big Data to deliver the unique Twitter experience replete with personalized content and trending news snippets. In recent years, this analytics platform has grown exponentially in the number of users, complexity, and use cases. For instance, in the year 2010, Twitter only had 100 employees with its data analytics team using a 30-node Hadoop cluster for its daily operations. Presently, the company has over a thousand employees, along with thousands of Hadoop nodes spread across multiple data centres. Daily, the Hadoop data warehouse ingests around 100 terabytes of raw data.
The Twitter application now handles thousands of millions of tweets every day. This level of volume requires a “deep and complex technology stack and architecture.” There “have probably been 100’s, if not 1000’s, of people working to maintain that for ensuring uptime, performance, speed, and scale.”
Besides that, according to a recent whistleblower report, “Twitter’s infrastructure is already creaky and lacks some of the backup and recovery options that are considered table stakes for businesses operating internet services of its type.” If this report is accurate, Elon Musk’s decision to cut the infrastructure cost could impact the usability of the Twitter app.
Employee layoffs and these challenges raise the risk that this can potentially make the Twitter app unusable. So, how is the company responding to these challenges? Let’s discuss that next.
How Is the Twitter Team Responding to This Disruption?
Taking into account the recent events, the question looms large – is Twitter going towards a slow and painful end?
At Twitter, engineers have publicly expressed their concern about how Elon Musk’s strategies could cause Twitter to go offline. Twitter employees also argue that Musk has little (or no) knowledge of how Twitter works.
On the positive side, Twitter experts do not expect the social media platform to struggle and go offline forever. They do expect that some systems (that handle more load than others) may fall silent permanently. Some employees believe that going ahead, Elon Musk may choose to prioritize the platform to work optimally only for celebrities and power users. For instance, there was this instance of Twitter running a dedicated server just for Justin Bieber.
On his part, Elon Musk is also trying to improve his understanding of Twitter’s tech stack. In a recent email, Musk requested Twitter’s software engineers to report to his San Francisco office. The aim is to provide a summary of what their written code has achieved over the past six months. Additionally, he plans to speak to remote working engineers using video calling.
Additionally, there are reports of a “heightened state of activity” in Twitter’s engineering team. After all the layoffs and resignations, the remaining Twitter engineers face the challenge of “keeping the site stable.” In a recent report, Twitter is also planning to set up an engineering team in India. Additional reports talk about Musk bringing in 50 Tesla engineers to Twitter for them to understand the existing tech stack.
Bottom Line – The Application Performance Lessons
To summarize, we can learn plenty of lessons from Twitter on the importance of application performance in the modern era. Application downtime for an extended period can seriously impact users and the company’s revenues. Indeed, Twitter developers will have to find ways to optimize the tech stack with fewer resources. This also means that they will have to look at performance, scalability, and reliability as critical pillars of their designs – while keeping the business goals in mind.
Indeed, this is a precarious and interesting time for Twitter. From productivity and revenue concerns for various businesses dependent on Twitter to an in-depth knowledge of the technology stack and its implications, we should expect some interesting updates in the coming months.