Load Testing Reports - Scaling Conversations and Feeds

1. Introduction

We at LikeMinds wanted to evaluate how our Chat and Feed services perform when users interact concurrently. With anticipated user growth and increasing reliance on real-time communication and content sharing, this stress test was key to understanding where our backend infrastructure shines—and where it strains.

2. How We Structured the Test

We tested three different environments with increasing infrastructure strength. Below is a detailed comparison of the infrastructure, including CPU, memory, and service layout:

Component	Setup 1 (Baseline)	Setup 2 (Optimized DB Compute)	Setup 3 (Feed Handling Optimized)
API Gateway (Kettle)	1 pod, 1Gi RAM, CPU: 2 vCPU	1 pod, 1Gi RAM, CPU: 2 vCPU	1 pod, 1Gi RAM, CPU: 2 vCPU
Chat Service (Caravan)	2 pods, 3Gi RAM each, CPU: 5 vCPU total	2 pods, 3Gi RAM each, CPU: 5 vCPU total	1 pod, 3Gi RAM, CPU: 4.5 vCPU
Background Worker	1 Celery pod, 1Gi RAM, CPU: 1–4 vCPU dynamic	1 Celery pod, 1Gi RAM, CPU: 1–4 vCPU dynamic	1 Swarm Worker pod, 1Gi RAM, CPU: 1–3 vCPU
Feed Service (Swarm)	2 pods, 1Gi RAM each, CPU: 2 vCPU	2 pods, 1Gi RAM each, CPU: 2 vCPU	5 pods, 3Gi RAM each, CPU: 2–5 vCPU
Rate Limiter (Skulk)	1 pod, 1Gi RAM, CPU: 1 vCPU	1 pod, 1Gi RAM, CPU: 1 vCPU	1 pod, 1Gi RAM, CPU: 1 vCPU
Database Type	PostgreSQL	PostgreSQL	Azure Cosmos DB (MongoDB vCore)
Database Spec	2 vCores, 8GiB RAM	4 vCores, 16GiB RAM	2 instances, 4Gi RAM each (replicated MongoDB cluster)
CPU Observed	Peak: 8.5%, Avg: ~5.3%	Peak: 5.62%, Avg: ~1.8%	Peak: 16%, Avg: ~5.2%
Memory Usage	Baseline: ~25%, Stable under load	Baseline: ~32%, Stable under load	Baseline: ~28%, Slight increase during peak loads

We simulated 100, 500, and 1000 users clicking, commenting, liking, and chatting—mimicking a real-life scenario where users are active concurrently.

3. Performance Metrics and Observations

Chatroom Performance

Action	Concurrent Users	Avg. Response Time (ms)	Throughput (requests/sec)
View Chatrooms	1000	4955	99.2
Check Member Status	1000	4543	120.8
DM Status Lookup	1000	4618	82.4

Upgrading to Setup 2 showed modest improvement in speed and reduced CPU pressure. Memory stayed stable throughout.

Chat Features (Join, Leave, Mute, etc.)

Action	Concurrent Users	Avg. Response Time (ms)	Throughput (requests/sec)
Join Chatroom	1000	4743	81.6
Leave Chatroom (Private)	1000	4627	82.9
View Participants	1000	4755	80.8
Mute Notifications	1000	6305	75.6

Mute and topic-related actions tended to be slower, suggesting additional backend processing.

Feed Handling (Content Posting and Interaction)

Action	Concurrent Users	Avg. Response Time (ms)	Throughput (requests/sec)
Create a Post	1000	57535	13.9
Like a Post	1000	35877	19.5
Comment on a Post	1000	32873	20.2

The most demanding actions were content-related. Creating posts at high volume led to nearly half of them timing out, especially under the highest pressure.

4. Unique Users Simulation

To better replicate a production environment, we re-ran the tests using different users each time. For instance, 1000 different people each created 1 distinct post (for a total of 1000 posts). The results were revealing:

Action	Users	Avg. Response Time (ms)	Throughput (requests/sec)
Create a Post	1000	27074	20.7
Like a Post	1000	2769	26.0
Comment on a Post	1000	52242	15.6

The changes reduced response time and improved throughput, proving the value of user-level optimizations.

5. Key Learnings

Service Stability: The services handled up to 500 concurrent users very comfortably. At 1000, higher latencies were observed, especially for content-heavy operations yet the services were able to manage the load effectively.

Infrastructure Scaling: Doubling database resources significantly lowered CPU load without increasing memory usage.

Caching Impact: When data wasn't cached (i.e., accessed for the first time), latencies were higher. Services relying on cached data performed more predictably.

6. What’s Next

Further Optimization: Post creation and commenting need more attention to prevent timeouts.

Smart Load Distribution: Adding smarter load balancing mechanisms may help maintain performance beyond 1000 users.

Monitoring and Auto-scaling: Real-time infrastructure scaling during traffic surges could prevent resource exhaustion.

Final Thoughts

While the services held strong under realistic load conditions, the results provided clear indicators on where we should optimize further. By testing various usage scenarios and infrastructure combinations, we now have a clearer roadmap to scale our platform sustainably.

Load Testing Reports - Scaling Conversations and Feeds

1. Introduction​

2. How We Structured the Test​

3. Performance Metrics and Observations​

Chatroom Performance​

Chat Features (Join, Leave, Mute, etc.)​

Feed Handling (Content Posting and Interaction)​

4. Unique Users Simulation​

5. Key Learnings​

6. What’s Next​

Final Thoughts​