- Release velocity is affected because of dependencies between components. This hampers development, testing and deployment time
- Scaling characteristics of different components are different such that they cause unreliable use of the resources of underlying hardware due to differing traffic patterns
- Capacity planning becomes hard
- Performance becomes unpredictable
- Resource exhaustion happens frequently and randomly
- Need to develop and scale the component independently and make it available as a service
Architecture |
|
|
|
12 factor app |
https://12factor.net |
|
Availability |
How is the service fault tolerant? |
|
Scalability |
What’s the horizontal and vertical scalability? |
|
Statelessness |
Is the service stateless? |
|
Async |
Can it use Lambda / Async services? |
|
Security Considerations |
2FA, HTTPS, Tokens, Encryption, GDPR, Penetration testing, App testing |
|
API |
Contracts, Versioning, Dependency |
|
Network |
• Proxy • Sync, Async, Batch • Multithreaded, Event based, Coroutine |
|
Load Handling |
• Load balancer • Circuit breaker • Throttling |
|
Replication |
Consistency |
|
Data |
• Transactions across services • Partitioning • Schema, Metadata, Evolution • Indexing, Querying • DB type |
|
Caching |
• Object caching • Page Caching |
|
Service Mesh |
• Istio |
|
Shutdown |
Graceful shutdown |
|
i18n Considerations |
|
SRE |
|
|
|
Backup / Restore |
• RPO - Recovery Point Object, • RTO - Recovery Time Objective |
|
Reliability |
• MTTF - Mean time to failure • MTTR - Mean time to Recovery • MTBF - Meantime between failure • Uptime • Fault tolerance |
|
Performance / SLAs |
• SLO's - Service Level Objectives • Response time • Latency • Throughput • Uptime |
|
Release Management Change Management Config Management |
• Zero Downtime upgrade, • Rolling deployments, • Automated deployments |
|
Container and Orchestration |
Docker / Docker Swarm or K8S |
|
Dev / QA environment |
Automated Dev / QA environments |
|
CI/CD pipeline |
Code Deploy, Circle CI, Codeship, Jenkins |
|
Upgrades / (0 Downtime) |
Zero downtime upgrade, Rolling upgrades, Canary rollout |
|
Deployment |
Ansible / Puppet |
|
CI/CD pipeline |
Code Deploy, Circle CI, Codeship |
|
Service Monitoring & Alerting |
Pingdom, Nagios, CloudWatch, Prometheus, DataDog |
|
Logging |
Logstash, Fluentd |
|
Cost |
Cost tags, Analytics, Cost structure, Reserved Instances, Projections, Cost Optimisations (Tools like Botmetrics) |
|
Capacity Planning |
|
|
Security |
IAM Roles, Encryption, HTTPS |
|
Networking |
Diagram, VPC |
|
Fleet management |
Tagging, AMI images, Versions, Upgrades, Consolidation, Pruning |
|
Incident Management and Incident Response |
Outages, Load Management, Latency, Security Incidents |
|
Process Management |
Process group, Process monitoring |
|
OnCall |
Pager Duty, VictorOps |
|
Versioning and Packaging |
|
Dev Process |
|
|
|
Git Flow |
Branching and Development process |
|
API Docs |
Swagger |
|
Sentry |
Error monitoring |
|
Metrics |
Concurrency, System metrics, Engineering Metrics |
|
Testing |
• Automation, • API testing, • Integration, Load, • Unit testing, • Deployment testing, • Checklist, • Regression |
General |
|
|
|
Language Version |
Eg: Python 3.x/ Java 7 |
|
Framework Version |
Eg: Django Version |
|
Library Version |
Eg: PyMongo Version |
|
Licenses |
Apache, MIT, GPL |
Others |
|
|
|
Metrics |
Deployment Frequency % of failed Deployments Time from Checkin to Deployment |
No comments:
Post a Comment