“My time at Meta was quite intense because sometimes I had to decide to halt development projects that seemed less effective. I had to carefully weigh those decisions because a misstep could negatively affecting Meta’s revenue and possibly bankruptcy.”
Tuan Vu - Senior Software Engineer at Quora & Admin at Viet Tech
My first job was as a data scientist for an AI company called Viant Technology in the US. The initial two years provided invaluable experience working with big data and machine learning. But in the third year after the business was acquired by a large corporation, the work culture changed, and innovation was no longer the main focus. As a Gemini, I have a perpetually curious nature and enjoy exploring new ideas, including perusing books, blogging, sharing, creating videos, instructing, etc. At this moment, I thought “Maybe I should try another job?”.
I spent one and a half years extensively explore a wide variety of fields and opportunities through hands-on experience: starting a business, consulting, writing books, teaching… I started writing [blog] (https://www.tuanavu.com/), wrote a whole book without publishing it, wrote it because I liked it, and then left it there. I also made a YouTube channel where I post videos about [SQL Tutorial for Beginners] (https://www.youtube.com/playlist?list=PLYizQ5FvN6ptzOHrJF-ewt8gPT9lqRqtj), [Apache Airflow Tutorials] (https://www.youtube.com/playlist?list?list=PLYizQ5FvN6psH2HDuC1ynHeMz7Q8z2Emp),… Even though I don’t make that many videos, I still manage to amass 10,000 subscribers (hehe).
After about 1 and a half years of self-discovery, I realized those avenues, while personally fulfilling hobbies, were not suited for sustainable long-term careers. At this point, I already have experience in big data and machine learning, but I want to do something new. I was intrigued in the field of infrastructure and wondered how to deploy cloud services for millions of devices and billions of users. Therefore, I decided the next step in my career would involve working for one of the largest technology companies dominating this space, such as Google Cloud, Amazon Web Services or Facebook—organizations with the type of worldwide infrastructure that can reach billions of users.
I joined the Facebook’s Ads Core Infrastructure team (Meta). During my two years working here, I gained tremendous expertise in resource management and infrastructure development. I was exposed to the entire life cycle and long-term planning involved in hardware and software engineering at such a massive global scale. For example, whenever the company wants to buy new hardware, chips, or build a data centre, we have to forecast and plan at least 4 years in advance. This includes predicting user growth, new services and features required, as well as more detailed technical factors such as model size (ML model), data, CPU, memory requirements, and detailed developmental studies at each phase.
For example, Meta currently has about 2 billion users. With such a growth rate, in the next 4 years, the company will need 10 million additional servers - different types of machines with different RAM, chip and hardware configurations. Such procurement must be carefully planned and orders placed well in advance. It takes 6 months to 1 year for manufacturers to produce and deliver chips in batches. In addition, I also have to calculate the cost of large machine learning models (large ML models) and the amount of data needed, with the amount of data increasing daily and feature restrictions from Apple. To replace those features, it requires using a lot of more data to compensate. Potentially multiplying costs tenfold just to maintain same model quality.
My time at Meta was quite intense due to constrained resources and inter-team competition for more than what was available. Without careful planning, infrastructure costs could surpass company’s revenue. Sometimes I had to make difficult decisions to block development projects if they seem ineffective. I had to carefully weighed those decisions because one mistake could lead Meta to negative revenue margin and eventually bankruptcy. Managing resources wisely at scale is a huge challenge. Therefore, large companies need to consider the long term and not just focus on current profits. With Meta’s enormous scale, effective cost management and resource allocation are extremely important. This is a great lesson about strategic resource allocation and long-term vision that I have learned and applied throughout my career.