A neuroscientist, lecturer and an Oil & Gas specialist – what do they have in common?
Our heartiest congratulations to Team SGRLers who represented AI Singapore to participate in the Computational Economics Competition hosted by the China Computer Federation (CCF). The team which comprised 3 AIAP apprentices – Chen Weiqiang and Huang Yuli (AIAP Batch 13) and Benjamin Chew (AIAP Batch 12) embarked on this 4-month-long challenge with the goal of learning and displaying their AI skills. Their efforts were recognised and they emerged with top honours in the competition.
We caught up with the team in this interview to find out more…
Team SGRLers was ranked 1st (Track 1) and 2nd (Track 2)
Q1: Can you tell us a bit about your team and your roles within the team?
Team SGRLers: All 3 of us were from different professional backgrounds. Ben was a neuroscientist by training, Weiqiang was an applied mathematician and a lecturer, and Yuli was a mechanical specialist from the oil & gas industry. What is common amongst us is probably our passion for AI, both in its applications and theoretical basis.
Given the very dynamic nature of this competition, we did not fix the role of each person in the team. Instead, we stayed agile by gathering feedback weekly, broke down the required tasks and made plans for the next week.
One thing we felt proud and fortunate about was the synergy in the team, coupled with our relatively strong programming skills. As it was an uncharted territory for the three of us, our plans could never stay up-to-speed with the changes. Whenever any one of us discovered something new (this could be something related to coding, reinforcement theory or game exploits), we will share with each other, motivating ourselves and keeping our minds open. We kept exploring new things and were willing to make adaptations regardless of the effort.
YL: Our team was like an autonomous vehicle and for me as the team lead, I could take my hands off the wheel most of the time, trusting that our team will move in the right direction and arrive at our destination.
Q2: What motivated your team to participate in this AI competition? Were there any specific goals or challenges that you wanted to tackle?
YL: We were all self-taught in the field of reinforcement learning (RL) via different routes. It was once considered the shortest path toward AGI until LLM took off in the past 1-2 years. In my opinion, its potential is still under-estimated. I went back to Robotics Research for 1 year before joining AIAP, and I have witnessed how the leading works by people like Pieter Abbeel and Sergey Levine are transforming the entire Robotics landscape. The literature also suggests that it has significant impacts in all fronts of Scientific breakthrough, including Biology, Atomic Energy, Math and Sociology. We took upon ourselves to build our theoretical foundation on RL, but we all realised that to improve our skills further we need to practise it, and at the same time to validate our learning via an open and fair competition.
Ben: In neuroscience, reinforcement learning remains a popular framework for modelling the interactions between neurotransmitters like dopamine and observed behaviour e.g. decisions, mood in response to the environment. Learning and meta-learning remain important concepts that would be useful additions to LLMs today to help develop smarter and more adaptive agents. As a result, I was interested in using the competition to improve my skills in the area.
WQ: I was intrigued by the interplay between mathematics, economics and reinforcement learning, and the chance of participating in a RL competition based in China, as my past experiences with competitions only involved Kaggle, which typically does not deal with reinforcement learning.
Q3: Why the team SGRLers?
Team SGRLers: The team’s name was coined from the fact we are Singapore Reinforcement Learning learners. SGRLers is the shortest possible acronym that captures the essence – sorry we are STEM geeks, very unromantic (haha…)
Q4: Can you share a bit more about this competition?
Team SGRLers: This competition was hosted by China Computer Federation (CCF), one of the most prominent academic bodies in China in the field of Computer Science and AI. It was set up with the aim to explore the potential of Decision Intelligence in the field of Macro Economics. One of the notable pioneering works was the AI Economist paper published on Science Advances in 2022. This Competition was along the same line. It created a simplified environment of an economic society with players like Government, Households, Firms and Banks interacting with each other under a set of Macro Economics Rules. The Government and Households are operated by agents, which exercise either a rule-based policy that is hardcoded, or a policy learned using various learning approaches, like Imitation Learning, Optimization or Reinforcement Learning.
The Competition had two tracks: one to find the best Government agent, and the other to find the best Household agent. The agents we submitted in each track were put into the multiple game simulations, competing, or collaborating with other agents submitted by other teams to maximise their individual rewards. For example, in the government track, the government agent received observations such as average wealth, income, productivity of each income group of households in the society.
Based on such information, the government agent would decide on its own spending, and different types of tax rates. The household agents would further react by deciding on how much they worked and how much they consumed. The game went on like that until any of the early termination conditions was triggered, or a fixed number of steps had been reached. At each step of the game, the government would be rewarded by the amount of GDP growth, and the extent of maintaining a fair wealth distribution among the population. To win the competition in this track, our submitted government agent needed to be a master in balancing the GDP growth and social equality.
Q5: Can you provide an overview of the solution that your team developed for the competition? What were the key techniques or algorithms that you employed?
Team SGRLers: The competition was very dynamic. The strategies of our competitors were evolving day-by-day, even the rules of the competition had gone through a major change in the middle of the competition. Just to cite an example, one of the competitors who scored highest in one of the warmup rounds was at the bottom of the leaderboard at the final round. Given this game was in a Multi-agent setup, there was probably no optimal policy that trumps the rest, at least we didn’t find any within the timeframe of this competition.
Instead of putting all eggs into one basket, we maintained a league of champion agents, and submitted the best-of-the-time agent based on the mainstream policies submitted by our competitors. In this league, we had neural-network agents trained using algorithms like independent PPO (proximal policy optimization), PPO with CTDE (centralised training, decentralised execution), DDPG (Deep Deterministic Policy Gradient), as well as our rule-based policy with a secret recipe. In the final round, our submission was based on a rule-based policy which was reverse engineered from our top competitor, but we added a clear trick that handles the transient stage well. With that trick, we managed to beat the rest including the competitor.
Q6: How did the team prepare for the RL competition?
YL: Self-read up on RL online courses such as the David Silver’s UCL series; read the book by Richard Sutton, RL algorithm and Deepmind publications e.g. Alpha Tensor, Alphastar, etc. MARL is learnt with literature and a bit of self-study on Game Theory, as it remains a niche direction in AI lacking structured learning materials.
Ben: I completed Hugging Face’s Deep Reinforcement Learning Course, dived into David Silver’s Reinforcement Learning lecture series on YouTube, and read up on MARL since the competition featured multiple agents. Although we did not end up going down this route, I also looked up recent advances in the field of Active Inference and Robotics which I briefly encountered during my neuroscience days.
WQ: I started with courses from Hugging Face on Proximal Policy Optimization (PPO), a key algorithm used in Reinforcement Learning from Human Feedback (RLHF). This coursework combined with practical exercises, such as implementing RL algorithms and experimenting with different strategies, helped us develop hands-on skills. Throughout this process, collaborative learning played a significant role, with team members sharing insights and engaging in discussions to deepen our collective understanding and problem-solving capabilities in RL. This blend of theoretical study, practical experience, and collaborative exploration equipped the team with a comprehensive understanding of RL, necessary for excelling in the competition.
Q7: Were there any challenges faced and how did the team overcome it/them?
YL: One primary challenge was the steep learning curve associated with mastering RL concepts and applications. To tackle this, team members dedicated considerable time to self-study. One significant advantage that we had in this team was that all of us were relatively fast learners. We were efficient in identifying relevant literature, capturing their essence, and then most importantly, sharing it effectively with the team. So, the knowledge acquisition rate is beyond linear.
Ben: As Yuli mentioned, we mostly came in with some theoretical knowledge that had to be kept updated and implemented in practical terms. The team also devoted a substantial chunk of time towards understanding the game environment as there were times that we had to correct our initial understanding of the game when certain experiments yielded puzzling results.
WQ: Another significant challenge was navigating the unfamiliar format of the Chinese competition platform, Jidi, as well as cultural differences, which was a new experience for Ben and me. To overcome this, the team, under Yuli’s leadership, organised weekly meetings over four months. These meetings provided a platform for collective experimentation, discussion, and clarification of doubts. They enabled our team to not only deepen our understanding of RL but also to strategize effectively for the unique competition format and cultural context. This consistent, collaborative approach was instrumental in adapting to and excelling in the new environment.
Q8: What did you think were the key success factors pertaining to this win? Did getting in the same AIAP programme help in any way?
Team SGRLers: The team’s success in the RL competition can be attributed to several key factors, including effective teamwork, learning from competitors, tenacity in experimentation, and skills honed through our participation in the AI Apprenticeship Programme (AIAP).
Teamwork played a pivotal role, with each member leveraging our unique strengths to contribute to the collective goal. For instance, Weiqiang’s quick adaptability and efficiency in coding, Ben’s deep knowledge of productivity tools and current RL algorithms, and YL’s proficiency in Mandarin for clear communication with the organisers, all combined to create a well-rounded and capable team. Furthermore, the team’s agility in learning from competitors was crucial. We also closely monitored and analysed competitors’ strategies, which allowed us to rapidly adapt and refine our own tactics in response to new challenges and insights.
The AIAP significantly contributed to our success by instilling essential traits such as tenacity in experimentation and constant communication. The rigorous training and experience gained from the programme enabled us to efficiently debug and iterate on our strategies, fostering a culture of persistent improvement and problem-solving. Moreover, the practice of introspection and learning from both our own trials and the strategies of opponents echoes the principles of “reinforcement learning”, reinforcing our skills in adapting and evolving their approaches dynamically. This combination of collaborative strength, adaptive learning, and skills honed through AIAP was instrumental in the team’s successful performance in the competition!
Q9: What was the most memorable part of the whole competition?
YL: The most memorable aspect of the competition for the team was the collaborative spirit and the excitement of experimentation with RL strategies. The process of working together as a cohesive unit, sharing ideas, and indulging in rich, sometimes wild, discussions about potential RL approaches created an environment of creativity and innovation. This team dynamic was not just about solving problems but also about the joy of exploring the realm of RL together, each member bringing our unique insights and perspectives to the table.
Ben: For me, it was the weekly meetings where we spent time going through ideas and strategies as a team. Those meetings often got sidetracked by discussions about the potential of RL and other algorithms for smarter agents and their subsequent impact on society which was always fun.
WQ: Another highlight for us was the thrill of submitting our Python scripts and RL agent model weights to the Jidi computational platform and eagerly awaiting the results of our economic simulation experiments. The anticipation and excitement of seeing our strategies come to life and the immediate feedback on our performance made the competition an exhilarating experience. This process of trial, observation, and adaptation, coupled with the collaborative brainstorming sessions, made the competition not just a challenge but a memorable journey of learning and discovery in the field of RL.
Q10: What advice do you have for aspiring data scientists or AI practitioners who want to participate in AI competitions and improve their skills?
YL: Compared with other projects, a competition requires more effort studying the environment and our competitors. Self-perfection alone is insufficient. It’s typically a huge undertaking – having a strong team is crucial. Keep your friends close, and enemies closer!
Ben: Be agile, dive into the data and environment, prepare for loads of experimentation, and keep an open mind. We did not set out to win but rather to learn as much as we possibly can.
WQ: I would advocate stepping beyond conventional learning methods and embracing real-world challenges. This means moving beyond standard courses and exercises to experiment with a variety of GitHub repositories and engaging in direct, hands-on problem-solving. While this path may initially lead to frustrations with debugging and executing code, especially in cases where open-source repositories may lack comprehensive documentation, it is crucial for long-term development in AI. Such practical experience exposes you to a range of coding styles and problem-solving techniques. Overcoming these challenges builds not just technical skills but also fosters resilience and adaptability, essential traits in the dynamic field of AI and data science.