When dealing with system to system call there are only two strategies. synchronous and asynchronous messaging
Synchronous Communications
- Over HTTP using REST
- Synchronous in nature
- Calls are wait for response. importantly that response is sent after the request is fully processed
- Each call becomes a blocking call
- client must wait a success or failure indicated by status code from the service being called
- Call paths can deep
- It can be bottleneck for longer process across many services
Asynchronous Communications
- HTTP using REST
- Client sends the call; server immediately responds with acceptance status code
- Polling or Push
- The client then either pulls the server or waits for a push message on call back URL to determine if the work was done and successful or done and failure
- Messaging system
- Is through the use of messaging systems like Rabbit , kafka ,JMS or others
- Drop message and move on
- Messages is put on the system and downstream consumer work on that message
When Use Asynchronous
- Offload Strain
- Not having every call be a Blocking call you can leverage more processing power in behind the scenes and not impact your customer
- Improve user experience
- Not every call needs an immediate response
- And there is correlations between user wait time and satisfactions
- The offloading to asynchronous can improve satisfaction of users
- Improve system health
- Many jobs take while to run to completions and using asynchronous communications not only offload strain but in doing so; keep system healthier
- Workflows
- Can allow you to build in natural retires without negatively impacting on performance process involved
When and why to use it
Prevent Gridlock
In TCP/IP communication on a network and its used to be a real issue with routing protocol it was possible to bring a network to its knees through what is called a broadcast storm. Each node was broadcasting it routing table to the whole network and the network itself would become gridlock . in similar pattern emerge in a microservice architecture
- Congestion
- As more and more services enter the system or even more prevalent, the container orchestrator the chattiness of communications can lead congestion in network
- Then you add to it the many calls needed to make one single operation network
- Exponential traffics
- Congestion grows exponential as more services are added to the system because seldom do we build one-to-one calls
- The more services on the network and the more service they consume the busier the load becomes
- Slow services
- One slow service can exacerbate this issue even more
- because that latency is felt in all services consuming the slow service
- In addition those blocking calls yield more wait which increase the impacting latency as more and more downstream services more impacted
This single most effective way to reduce gridlock on the system is to remove blocking calls where not needed, The less traffic that is waiting on responses improves the traffic flow which in turn can reduce grid lock dramatically
Long-Running process
- Blocking
- As we discussed in gridlock blocking calls can be an issue in network congestion that issue become worse
- Natural retry
- Asynchronous messaging provides an opportunity to build natural retry logic without another network call
- Consumer of the asynchronous message can handle its own retry logic without the client having any input at all
- This actually can reduce the overall error state of system
- No more call trees
- long running process usually have complex call trees needed get the work done
- When moving asynchronous messaging the call trees become non-existent. which again improves overall performance
- Each system handles its work and passes the message an original or transform state
- Its not reduce the performance reduce complexity of the calls from system view
- Timeouts
- Can reduce the risk of timeouts
- When have long running process that are blocking calls from the client side there are risk of natural timeouts in that blocking call this call service to flop or become unusable as sockets are cleaned up
Reduce Coupling
- Monolithic Microservices
- Especially where everything is synchronous in creating highly coupled system
- Now we end up building new monolith with lot more complexity and chattiness
- Without solving any of the real problems we intend to
- Consumer not know
- The consumer of your asynchronous message isn’t know
- Your communication only through the contract of the message and as such your coupling reduced naturally
- Highly coupled call trees
- If you build in call tress to work done
- If your client call several services
- That in turn call several services
- All of which are tied to original call these call tree can become a nightmare
- In asynchronous messaging each consumer will act independently of the rest
Additional Benefits
- QoS/Priority
- You can build QoS/Priority in your system
- This allows the consumer to prioritize the work in a way that make sense for your business without negatively impacting your users
- Fault tolerance
- Fault tolerance also become an easier to solve
- In synchronous flow if client is unavailable you may wait for response that will never come . need to client side logic to handle this
- In asynchronous systems the contract will handle the case of unavailable system and message won’t block or timeout
- Response not needed immediately
- Logging / Metrics /Analytics
The Tradeoffs
Complexity
- Artifact sprawl
- Usually consumer of messages are individual artifacts that have their own repository, build pipeline , deployment pipeline and config management
- Disconnected code path
- having disconnected paths makes debugging troubleshooting and evaluating outcomes significantly more complex
- Multiple paths
- Once you add in fan-outs or conditional jobs. your system increase in complexity to all aspects from development to operations
Observability
- Lack of immediate response
- Mean that your inspection has disconnect
- You lose ability to log the error or success in the client immediately
- As such you have to inspect other logs to complete the view of the whole process
- Log aggregation
- Obviously logging is critical for observability of the health of the system
- In synchronous system logs and error are unified
- In asynchronous often need to add things like correlation ids to your log messages to help you to aggregate the logs and reassemble calls
- Metric collections
- Isolation of component there still may be need to figure out the upstream and downstream causes of abnormal load
Additional Complexity
- Additional components
- The sheer fact the component will increase especially asynchronous messaging system and that adds to complexity to several areas
- Each new component not only brings the overhead of that component across build and deployment pipelines but also resource utilization
- New artifacts especially in containerized environments have their own overhead with respect resources
- You have potentially have more databases connections more RAM and CPU and other cost
- In commercialized containerized environment you will have more container running which also impacts your operational cost
- Operational run books
- The asynchronous nature of your systems and the expected hops must be well documented
- Operations must be trace tasks through the system without the need to debugging code of possible hundreds of artifact
- Need to provide as much information via logs to allow for aggregation and quick inspection of potentially trouble workloads
- Its critically to ensure the system can be run and maintained
- You need to build and keep well maintained operational run books for every system component
- Issue source identification
- Finding the source of issue is significantly more complex
- While an issue may still arise in a single component and log of the component can help the root cause is not easy to find
- There are time where contracts may be validated or new use cases not originally intended cause a message and subsequent consumer to be fired in not so good state
- This new use case or modification of an exiting one can make identifying the issue source significantly harder
- In choreographed event model the downstream impact of an issue may be several hops and down the message chain, there for tracing the issue through the calls stack is harder
In Asynchronous messaging :
Producer:
- A producer is the system that creates the message for some system to act on
- Its responsible under the correct conditions for building a message using contractual format and dispatching in to the message broker
Consumer /Receiver:
- Is the system that receives message from the message broker, There are several ways that this can be accomplished
- But ultimately once the message is received the consumer will act on that mssage
Dead letter queue:
- Where error messages go
Message Broker
The message broker is one of the core components required for asynchronous messaging in microservices architecture system
Heart of system:
- Central responsible for all messaging system
Translation:
- It provide native mechanism to translate message from one to system in to another system
- It can translate and transform the message as it comes in and prepare it for the consumer as needed
Routing:
- Routing comes in many forms from simple point-to-point to inspection-based routing and fanning out of message
Aggregation:
- Messaging can usually be aggregated as needed
- If several messages should be broke a part or reassemble to do this work for you
- So you can again target message in most efficient way possible
Errors:
- One of the most ignored aspects of messaging brokers the ability to handle errors in the system and respond or alert to them
The message broker handles these tasks for you. so that your consumer are only impacted by the message they needed to respond to
Common Message Brokers
- RabbitMQ
- Apache ActiveMQ
- JMS
- Apache Kafka
- Cache