On March 10, 2021, our platform experienced intermittent access issues affecting trading services. This report outlines the details of the incident, our response, and ongoing efforts to improve system stability.
What Happened During the System Outage
From 17:56 to 18:15 HKT on March 10, 2021, users encountered intermittent disruptions in spot and margin trading services. These issues impacted all access points, including the web platform, mobile application, and API interfaces.
Our technical team immediately launched an investigation. The root cause was identified as an unexpected stoppage of an internal service critical to the trading system's operation, which led to a temporary system-wide halt.
Timeline of Incident Response and Resolution
A detailed timeline of our team's detection and response efforts is outlined below.
- 17:56 HKT: Our monitoring systems detected the anomaly and triggered automatic alerts. API users began receiving "30030" error codes with the message: "Matching engine is being upgraded. Please try in about 1 minute."
- 17:57 HKT: Our development team activated the emergency incident response protocol to diagnose and locate the source of the fault.
- 18:05 HKT: The root cause was confirmed: a critical internal service supporting the trading engine had unexpectedly stopped.
- 18:07 HKT: Engineers initiated a structured emergency repair process and prepared to restart the affected trading components.
- 18:15 HKT: The restart procedure was completed successfully, and full trading services for both spot and margin markets were restored.
Our Commitment to Platform Stability and Uptime
We are dedicated to providing a reliable, 24/7 trading environment for all our users. However, due to the extreme complexity of high-performance trading systems, it is not possible to guarantee 100% uninterrupted uptime. Despite this, we are continuously investing in and enhancing our infrastructure to minimize the probability and impact of such events.
Our ongoing initiatives to improve system resilience include:
- Enhanced Quality Assurance: We have strengthened our testing frameworks and protocols. All new feature code must now undergo a sustained period of stable operation on a simulated environment before being deployed to the live market.
- Architectural Upgrades: We are implementing multi-machine and multi-region high-availability solutions. This design is intended to significantly reduce potential downtime caused by hardware or software failures.
- Advanced Update Procedures: We have developed hot-update capabilities for stateless logic, allowing for certain upgrades to occur without any disruption to user trading activities.
For a deeper understanding of how advanced platforms manage real-time system health, you can explore more strategies for maintaining high availability.
How to Stay Informed on System Status and Updates
Transparent communication is a priority for us. We provide updates through the following channels:
- Status Page: Following any incident, a detailed report and explanation are published on our official Status page.
- Proactive Notifications: For any planned system upgrades or unforeseen events, we post announcements on the Status page. These are also communicated through our various market and community channels, including dedicated groups for API and retail users.
- API Channels: API users can subscribe to the
system/statuschannel to receive real-time notifications and alerts directly.
Frequently Asked Questions
What was the specific error users saw during the outage?
API users received a "30030" error code with a message indicating the matching engine was undergoing upgrades. Users on the web and mobile apps experienced an inability to execute trades or access certain trading functions during the brief window.
How long did the service disruption actually last?
The total duration of the trading service interruption was approximately 19 minutes, from 17:56 to 18:15 HKT. Our teams worked to resolve the issue as quickly as possible.
What has been done to prevent this exact issue from happening again?
We have reviewed and reinforced the monitoring and failover mechanisms for the specific internal service that failed. This is part of our broader, continuous investment in system architecture to enhance overall stability and prevent similar occurrences.
Where is the most reliable place to check for real-time system status?
The official Status page is the primary and most reliable source for confirmed information on system performance, ongoing incidents, and planned maintenance schedules.
Does this incident affect the security of user funds?
No. The outage was related to trading service availability only. All user funds remained secure throughout the incident and were never at risk.
How can traders stay updated during future events?
We recommend bookmarking the Status page and, for API users, subscribing to the relevant notification channels. To view real-time tools for monitoring platform performance, these resources are essential.