Why microservices need to fail-fast instead of fail-slow
When error occurs whether or not it is an exception or a validation error, the service
should stop processing and return that error status to the consumer. This will reserve
the processing power for the server.
Another reason that we don’t output too much info in the response is security concerns. If
an hacker tries to explore your service, he/she will construct invalid requests and see what
the response error is. If there are too much information in the response, it can dramatically
simplify the process. For normal consumer developers, they have the centralized logging to
help them to identify what is really happened when a simple error code is thrown.
The more time for the service to process additional step after an exception, the slower the
response will be. Normally, the processing on the exception branches will be significantly
slower then the happy path. This will cause the consumer to be waiting and the consumer of
the consumer to wait. In Java EE technology stack, it will cause the thread pool of the consumer
to be run out easily and the server is no response for new request. The same failure will
quickly propagate to the consumer of the consumer quickly and eventually brought donw the
entire system. That is why it is highly recommended to implement circuit breaker and bull head.
Why designers keep using fail-slow
There are several wrong assumptions in designer’s mind when they decide to use fail-slow.
The service will be called by UI
Most services in microservices architecture would be consumed by other services or web server
/service aggregator. For these type of consumers, it wants fast-fail and the response should be
as simple as possible.
User need to have all the errors in one shot
Most arguments regarding to the fail-slow design is trying to give user all the validation errors
so that users can fix all of them and resubmit. This actually come from the old world of server
side rendering which is based on JSP or Servlet. The response time is miserable on these systems
and designers have to save every round trip to the server in order to improve user experience and
reduce the load to the server as Java EE is blocking and throughput is constrained by the number
of threads. With this mentality, the response is getting bigger and bigger and the response time
is getting slower and slower.
When an organization is adopting microservices architecture, chances are they will build their
UI with mobile native app or single page app (Angular/React) to talk to the services built. In
this type of design, the communication between client and server is based on a small JSON object
or even an ProtoBuf binary and the response time usually within 10 milliseconds. The entire
validation design is changed from validate when submitting to validate when typing. Take look
at the Google.com when searching, every character you type, there is a request and response
between your browser and google.com server. This goes to extreme as Google has the power and
resource to do that. However, for most microserivces based solution, validate when user move the
cursor from one field to another field on a form is a piece of cake. Given here each field is
validated individually, the error message is small and with only one error at a time.
Given the drawbacks of the fail-slow and the suitable use cases are replaced by the SPA. There
is no need to design your server to be fail-slow. For microservices architecture, one of the
principles is fail-fast.