I’m currently working on pulling real-time market data using a WebSocket connection in Python, based on the sample code below:
https://github.com/Abdullah-2906/LSEG-python-websocket/blob/main/streaming.py
I’d appreciate some guidance on best practices and optimization.
Current setup
- Using Python with a WebSocket connection to stream data
- Subscribed to ~1200 RICs
- Incoming data is pushed to Redis
- A Celery worker processes the data and updates the database
Issues I’m facing
The WebSocket connection drops after some time
Questions
- What is the most reliable way to maintain a stable WebSocket connection for real-time data?
- Should the connection remain open continuously, or is it better to reconnect periodically (e.g., at market open/close)?
- Is Python suitable for this scale, or would another language (e.g., Go, Node.js, Java) perform better for handling high-frequency streaming data?
- What are the best practices for handling large subscriptions (~1200 instruments)?
- Are there recommended patterns for buffering/queueing (e.g., Redis) before writing to a database?
Goal
I’m trying to build a robust and scalable pipeline where:
- Data is streamed reliably without frequent disconnects
- Processing is efficient and non-blocking
- The system can handle high throughput without losing messages
Any advice, sample architectures, or code examples would be really helpful.
If any other languague which works best like c i would love to swithc but need sample exacmple for straming and updating to database