LIGHT

  • News
  • Docs
  • Community
  • Reddit
  • GitHub
Star

Fail Fast vs Fail Slow

Why microservices need to fail-fast instead of fail-slow

Reserve resources

When error occurs whether or not it is an exception or a validation error, the service should stop processing and return that error status to the consumer. This will reserve the processing power for the server.

Security concerns

Another reason that we don’t output too much info in the response is security concerns. If an hacker tries to explore your service, he/she will construct invalid requests and see what the response error is. If there are too much information in the response, it can dramatically simplify the process. For normal consumer developers, they have the centralized logging to help them to identify what is really happened when a simple error code is thrown.

Cascade failure

The more time for the service to process additional step after an exception, the slower the response will be. Normally, the processing on the exception branches will be significantly slower then the happy path. This will cause the consumer to be waiting and the consumer of the consumer to wait. In Java EE technology stack, it will cause the thread pool of the consumer to be run out easily and the server is no response for new request. The same failure will quickly propagate to the consumer of the consumer quickly and eventually brought donw the entire system. That is why it is highly recommended to implement circuit breaker and bull head.

Why designers keep using fail-slow

There are several wrong assumptions in designer’s mind when they decide to use fail-slow.

The service will be called by UI

Most services in microservices architecture would be consumed by other services or web server /service aggregator. For these type of consumers, it wants fast-fail and the response should be as simple as possible.

User need to have all the errors in one shot

Most arguments regarding to the fail-slow design is trying to give user all the validation errors so that users can fix all of them and resubmit. This actually come from the old world of server side rendering which is based on JSP or Servlet. The response time is miserable on these systems and designers have to save every round trip to the server in order to improve user experience and reduce the load to the server as Java EE is blocking and throughput is constrained by the number of threads. With this mentality, the response is getting bigger and bigger and the response time is getting slower and slower.

When an organization is adopting microservices architecture, chances are they will build their UI with mobile native app or single page app (Angular/React) to talk to the services built. In this type of design, the communication between client and server is based on a small JSON object or even an ProtoBuf binary and the response time usually within 10 milliseconds. The entire validation design is changed from validate when submitting to validate when typing. Take look at the Google.com when searching, every character you type, there is a request and response between your browser and google.com server. This goes to extreme as Google has the power and resource to do that. However, for most microserivces based solution, validate when user move the cursor from one field to another field on a form is a piece of cake. Given here each field is validated individually, the error message is small and with only one error at a time.

Conclusion

Given the drawbacks of the fail-slow and the suitable use cases are replaced by the SPA. There is no need to design your server to be fail-slow. For microservices architecture, one of the principles is fail-fast.

33

See Also

  • Eco System
  • CQRS
  • Event Sourcing
  • Service Mesh
  • JavaEE declining
  • About Light Platform
    • Overview
    • Testimonials
    • What is Light
    • Features
    • Principles
    • Benefits
    • Roadmap
    • Community
    • Articles
    • Videos
    • License
  • Getting Started
    • Get Started Overview
    • Environment
    • Light Codegen Tool
    • Light Rest 4j
    • Light Tram 4j
    • Light Graphql 4j
    • Light Hybrid 4j
    • Light Eventuate 4j
    • Light Oauth2
    • Light Portal Service
    • Light Proxy Server
    • Light Router Server
    • Light Config Server
    • Light Saga 4j
    • Light Session 4j
    • Webserver
    • Websocket
    • Spring Boot Servlet
  • Architecture
    • Architecture Overview
    • API Category
    • API Gateway
    • Architecture Patterns
    • CQRS
    • Eco System
    • Event Sourcing
    • Fail Fast vs Fail Slow
    • Integration Patterns
    • JavaEE declining
    • Key Distribution
    • Microservices Architecture
    • Microservices Monitoring
    • Microservices Security
    • Microservices Traceability
    • Modular Monolith
    • Platform Ecosystem
    • Plugin Architecture
    • Scalability and Performance
    • Serverless
    • Service Collaboration
    • Service Mesh
    • SOA
    • Spring is bloated
    • Stages of API Adoption
    • Transaction Management
    • Microservices Cross-cutting Concerns Options
    • Service Mesh Plus
    • Service Discovery
  • Design
    • Design Overview
    • Design First vs Code First
    • Desgin Pattern
    • Service Evolution
    • Consumer Contract and Consumer Driven Contract
    • Handling Partial Failure
    • Idempotency
    • Environment Segregation
    • Multi-Tenancy
    • Why check token expiration
    • WebServices to Microservices
  • Cross-Cutting Concerns
    • Concerns Overview
  • API Styles
    • Light-4j for absolute performance
    • Style Overview
    • Distributed session on IMDG
    • Hybrid Serverless Modularized Monolithic
    • Kafka - Event Sourcing and CQRS
    • REST - Representational state transfer
    • Web Server with Light Platform
    • Websocket with light platform
    • Spring Boot Integration
    • Single Page Application
    • GraphQL - A query language for your API
    • Light IBM MQ
    • Light AWS Lambda
    • Chaos Monkey
  • Infrastructure Services
    • Service Overview
    • Light Proxy
    • Light Router
    • Light Portal
    • Messaging Infrastructure
    • Centralized Logging
    • COVID-19
    • Light OAuth2
    • Metrics and Alerts
    • Config Server
    • Tokenization
    • Light Controller
  • Tool Chain
    • Tool Chain Overview
  • Utility Library
  • Service Consumer
    • Service Consumer
  • Development
    • Development Overview
  • Deployment
    • Deployment Overview
    • Frontend Backend
    • Linux Service
    • Windows Service
    • Install Eventuate on Windows
    • Secure API
    • Client vs light-router
    • Memory Limit
    • Deploy to Kubernetes
  • Benchmark
    • Benchmark Overview
  • Tutorial
    • Tutorial Overview
  • Troubleshooting
    • Troubleshoot
  • FAQ
    • FAQ Overview
  • Milestones
  • Contribute
    • Contribute to Light
    • Development
    • Documentation
    • Example
    • Tutorial
“Fail Fast vs Fail Slow” was last updated: April 2, 2019: fixes #62 add Chinese language for the document site (5c820aa)
Improve this page
  • News
  • Docs
  • Community
  • Reddit
  • GitHub
  • About Light Platform
  • Getting Started
  • Architecture
  • Design
  • Cross-Cutting Concerns
  • API Styles
  • Infrastructure Services
  • Tool Chain
  • Utility Library
  • Service Consumer
  • Development
  • Deployment
  • Benchmark
  • Tutorial
  • Troubleshooting
  • FAQ
  • Milestones
  • Contribute