Casperhr · February 18, 2018 20:16
diff --git a/riide-scaling b/riide-scaling
 # Scaling plan for Riide Backend
 last updated 16 feb 2018

 ### Scaling bottle necks
 
 - MySQL is growing rapidly, 52gb (feb 2018, grows 2-3gb a week atm). This is not a urgent issue. But we need to handle this before it grows above 600GB. 
 And the earlier we solve this. The faster is it (since you are migrating less data)
    - It is possible to just buy a bigger server, but restoring backups / horizontal scaling will take days in the end
 - MySQL is just one RDS server. Which handles all traffic. Horizontal scaling should be looked into
 - Queues, jobs are too big or takes too long time. 
 - user_update jobs 1:N issue, we are syncronizing all users with each company. And pulling icabbi bookings from each company.
    - Throwing more server power at this could solve it yes. 
 - Fallback system looping every active booking every 5min (configurable), 1:N issue again
    - Throwing more server power at this could solve it yes. 
 
 ### Other issues     
 
 - No application monitoring, we have to see the issues, replicate them before we can work on a fix (unless it's a crash (bugsnag))
 - Peak periods are always in the weekends, where we are not work.
 - Icabbi integration is not a provider anymore, we have coupled the systems very much
 - New companies are added and setup with errors, alerts are not getting solved
 - There is a overlap of webhooks and our fallback system, which is the issue to several bugs
    - Webhook delay is not always working
    - Double charges (some cases)
    - Drivers getting payment success and then failed  
 
 ### Suggestions
 
 - Install new relic APM , best way to monitor how the app is used, and where to spend time on optimization (API calls)
 - Setup elastic search
 - Move all historical data / none mission critical(booking updates, payment updates, webhook requests, booking locations etc) out of MySQL to Elastic search
 - Build / setup a queue / command monitoring tool. We need to get an overview of what is running when. And if they finish in time. Afterwards optimize it
 - MySQL optimize 
    - Look into booking obj, if eg locations / directions can removed or moved to either Elastic or files 
    - Look into booking indexes (2.1GB)
    - Order without index(While there is nothing wrong with a high amount of row sorting, you might want to make sure that the queries which require a lot of sorting use indexed columns in the ORDER BY clause, as this will result in much faster sorting.)
    - Joins without index(	This means that joins are doing full table scans. Adding indexes for the columns being used in the join conditions will greatly speed up table joins.)
    - Optimize tables (function) 
    - Run through config update suggestions
 - Improve clean up scripts
 - Setup read / write system for MySQL, for horizontal scaling
    - Consider Aurora DB (Amazons new MySQL (15.feb 5.7 support))
 - Improve the dashboard
 - Improve the alert system  

 
 
 ### Prices
 
  - New relic - 400£ (waiting offer)
  - Elastic - 300£ (250$ prod, 58$ stag)
  - Setup read / write system for MySQL - 750-1500£ (950$ per instance)
  - 3 more replicas - 580£ (750$)
  
 ### Development estimates
 
  - Setup new relic - 10h
  - Setup elastic - 20h
  - Build Queue / command monitoring tool - 30h
  - MySQL optimize - 30h
  - Improve clean up scripts - 15h
  - Setup read / write system for MySQL - 75h
	# Scaling plan for Riide Backend
	last updated 16 feb 2018

	### Scaling bottle necks

	- MySQL is growing rapidly, 52gb (feb 2018, grows 2-3gb a week atm). This is not a urgent issue. But we need to handle this before it grows above 600GB.
	And the earlier we solve this. The faster is it (since you are migrating less data)
	- It is possible to just buy a bigger server, but restoring backups / horizontal scaling will take days in the end
	- MySQL is just one RDS server. Which handles all traffic. Horizontal scaling should be looked into
	- Queues, jobs are too big or takes too long time.
	- user_update jobs 1:N issue, we are syncronizing all users with each company. And pulling icabbi bookings from each company.
	- Throwing more server power at this could solve it yes.
	- Fallback system looping every active booking every 5min (configurable), 1:N issue again
	- Throwing more server power at this could solve it yes.

	### Other issues

	- No application monitoring, we have to see the issues, replicate them before we can work on a fix (unless it's a crash (bugsnag))
	- Peak periods are always in the weekends, where we are not work.
	- Icabbi integration is not a provider anymore, we have coupled the systems very much
	- New companies are added and setup with errors, alerts are not getting solved
	- There is a overlap of webhooks and our fallback system, which is the issue to several bugs
	- Webhook delay is not always working
	- Double charges (some cases)
	- Drivers getting payment success and then failed

	### Suggestions

	- Install new relic APM , best way to monitor how the app is used, and where to spend time on optimization (API calls)
	- Setup elastic search
	- Move all historical data / none mission critical(booking updates, payment updates, webhook requests, booking locations etc) out of MySQL to Elastic search
	- Build / setup a queue / command monitoring tool. We need to get an overview of what is running when. And if they finish in time. Afterwards optimize it
	- MySQL optimize
	- Look into booking obj, if eg locations / directions can removed or moved to either Elastic or files
	- Look into booking indexes (2.1GB)
	- Order without index(While there is nothing wrong with a high amount of row sorting, you might want to make sure that the queries which require a lot of sorting use indexed columns in the ORDER BY clause, as this will result in much faster sorting.)
	- Joins without index( This means that joins are doing full table scans. Adding indexes for the columns being used in the join conditions will greatly speed up table joins.)
	- Optimize tables (function)
	- Run through config update suggestions
	- Improve clean up scripts
	- Setup read / write system for MySQL, for horizontal scaling
	- Consider Aurora DB (Amazons new MySQL (15.feb 5.7 support))
	- Improve the dashboard
	- Improve the alert system



	### Prices

	- New relic - 400£ (waiting offer)
	- Elastic - 300£ (250$ prod, 58$ stag)
	- Setup read / write system for MySQL - 750-1500£ (950$ per instance)
	- 3 more replicas - 580£ (750$)

	### Development estimates

	- Setup new relic - 10h
	- Setup elastic - 20h
	- Build Queue / command monitoring tool - 30h
	- MySQL optimize - 30h
	- Improve clean up scripts - 15h
	- Setup read / write system for MySQL - 75h
No results found