Skip to content

Instantly share code, notes, and snippets.

@Casperhr
Created February 18, 2018 20:16
Show Gist options
  • Select an option

  • Save Casperhr/eb0ae683086ce130492e2cf82590eb85 to your computer and use it in GitHub Desktop.

Select an option

Save Casperhr/eb0ae683086ce130492e2cf82590eb85 to your computer and use it in GitHub Desktop.
# Scaling plan for Riide Backend
last updated 16 feb 2018
### Scaling bottle necks
- MySQL is growing rapidly, 52gb (feb 2018, grows 2-3gb a week atm). This is not a urgent issue. But we need to handle this before it grows above 600GB.
And the earlier we solve this. The faster is it (since you are migrating less data)
- It is possible to just buy a bigger server, but restoring backups / horizontal scaling will take days in the end
- MySQL is just one RDS server. Which handles all traffic. Horizontal scaling should be looked into
- Queues, jobs are too big or takes too long time.
- user_update jobs 1:N issue, we are syncronizing all users with each company. And pulling icabbi bookings from each company.
- Throwing more server power at this could solve it yes.
- Fallback system looping every active booking every 5min (configurable), 1:N issue again
- Throwing more server power at this could solve it yes.
### Other issues
- No application monitoring, we have to see the issues, replicate them before we can work on a fix (unless it's a crash (bugsnag))
- Peak periods are always in the weekends, where we are not work.
- Icabbi integration is not a provider anymore, we have coupled the systems very much
- New companies are added and setup with errors, alerts are not getting solved
- There is a overlap of webhooks and our fallback system, which is the issue to several bugs
- Webhook delay is not always working
- Double charges (some cases)
- Drivers getting payment success and then failed
### Suggestions
- Install new relic APM , best way to monitor how the app is used, and where to spend time on optimization (API calls)
- Setup elastic search
- Move all historical data / none mission critical(booking updates, payment updates, webhook requests, booking locations etc) out of MySQL to Elastic search
- Build / setup a queue / command monitoring tool. We need to get an overview of what is running when. And if they finish in time. Afterwards optimize it
- MySQL optimize
- Look into booking obj, if eg locations / directions can removed or moved to either Elastic or files
- Look into booking indexes (2.1GB)
- Order without index(While there is nothing wrong with a high amount of row sorting, you might want to make sure that the queries which require a lot of sorting use indexed columns in the ORDER BY clause, as this will result in much faster sorting.)
- Joins without index( This means that joins are doing full table scans. Adding indexes for the columns being used in the join conditions will greatly speed up table joins.)
- Optimize tables (function)
- Run through config update suggestions
- Improve clean up scripts
- Setup read / write system for MySQL, for horizontal scaling
- Consider Aurora DB (Amazons new MySQL (15.feb 5.7 support))
- Improve the dashboard
- Improve the alert system
### Prices
- New relic - 400£ (waiting offer)
- Elastic - 300£ (250$ prod, 58$ stag)
- Setup read / write system for MySQL - 750-1500£ (950$ per instance)
- 3 more replicas - 580£ (750$)
### Development estimates
- Setup new relic - 10h
- Setup elastic - 20h
- Build Queue / command monitoring tool - 30h
- MySQL optimize - 30h
- Improve clean up scripts - 15h
- Setup read / write system for MySQL - 75h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment