Light Controller

When deploying large-scale microservices to the cloud environment in a zero-trust network, the traditional pull from each service to check if it is healthy is not feasible. Instead, we need each service instance to register the IP and Port to the Portal Controller service through API calls during the server startup and de-register during the server shutdown. It requires the API framework to have a component that is responsible for registering and de-registering. We can also leverage the Portal Controller for the service discovery as the Controller contains all the access information for live nodes. With an API exposed from the Controller, API consumers can query the Controller to get a list of instances it is about to invoke. In the service to service call scenario, a service can register itself and, at the same time, query the Controller to discover downstream APIs. A service can be called by multiple other services or clients in a complex call stack, and it might call several other services to fulfill its requests. The Portal Controller can support all of the above and beyond.

For a service that wants to register to the controller during the server startup, you need a config file portal-registry.yml to define the controller’s URL. You can set up the health check interval in the same file, how long the service should be removed from the controller if the health check fails, and what kind of health check is selected.

The controller supports HTTP and TTL health checks. The service needs to choose between them.

HTTP check is initiated from the controller to call each service’s /health/{serviceId} endpoint, and it is recommended. TTL is for services running in a separated network, and the controller cannot access the health check endpoint.

In light-4j, there is a module called portal-registry that is responsible for the communication to the controller. You can find all the details on how to set up the module in your API in the cross-cutting concerns section.

The following is a sequence diagram that illustrates the service startup and shutdown activities.

portal-registry

In the above picture, we only draw one service as an example. In reality, there would be hundreds or thousands of services registering and deregistering themselves on the portal controller.

Because all live instances for each service are all registered to the portal controller, the controller has a snapshot of all healthy instances at any time. The information is very valuable for global service discovery in a cloud environment. For each client to invoke a downstream service, it can go to the controller to look up healthy instances based on the serviceId and an optional environment tag. This list can be cached for future usage at the client-side so that the client doesn’t need to call the controller for each downstream call. If the client uses Http2Client provided by the light-4j, the connection to the downstream API can be cached so that the same connection can be reused until a specific time (8 hours) or a certain number of requests (1 million) is reached.

Service owners might scale their services up and down in a dynamic cloud environment to utilize the resources efficiently. It means the number of instances per service, the IP, and the port combination are all changing. All changes are captured in the controller and we need to find a way to notify the clients that their downstream service is changed.

There are two options for the notifications: Long poll (blocking query) and WebSocket. The Consul registry uses a blocking query so that each subscribed downstream service needs a thread to pull the controller constantly. This approach wastes a lot of CPU threads if there are multiple downstream APIs. Portal Registry uses WebSocket so that all the updates for the subscribers can share the same channel with only one callback to update the internal cache on the client.

Here is a picture of the client discovery and synchronization with the controller.

portal-discovery

Please note that there is only one client in the picture to make the diagram simple and easy to understand. In a real environment, there would be hundreds or even thousands of clients in the picture. A service itself can be a client that calls several services, and several clients can call it.

Although the light-controller is not an open-source project, it is free to use in a docker container. We also provide a docker-comopse-controller.yml in the light-docker project with all the configuration files externalized to the controller folder for customization.

Specification Endpoint

The OpenAPI 3.0 specification can be found in the resources/config folder in the light-controller repository. Get All Services paths: '/services': get: summary: Query all healthy services description: Returns a list of services operationId: getHealthService responses: '200': description: successful operation '400': description: Invalid request content: application/json: schema: $ref: '#/components/schemas/Status' '404': description: Service not found content: application/json: schema: $ref: '#/components/schemas/Status' security: - portal_auth: - 'A8E73740C0041C03D67C3A951AA1D7533C8F9F2FB57D7BA107210B9BC9E06DA2' In the cluster mode, this endpoint gets all the live services across the cluster, and it also filters out any stale services with health check tasks not being generated from the light-scheduler. Read More »

Service Registration

Two Mode

Users can start the controller in two different modes: Demo and Cluster. Demo Mode The service runs in a single instance for demo mode, and all the registration and health check information is in memory. All APIs are working the same as the cluster mode, but they cannot be scaled. Once the instance is down, all information is gone, and all services should be restarted to register again. There is no Kafka cluster dependency when running in this mode, and it is easy for users to run it on their desktop or laptop with docker for demo and local development. Read More »

Getting Started

To start the light-controller cd ~/networknt/light-docker docker-compose -f docker-compose-controller.yml up -d Read More »

Health Check

Once a service is registered to the controller in the service post endpoint, the controller will allocate a task to send health check requests to the service instance periodically. Depending on the running mode, the health check task will be executed differently. Demo Mode Cluster Mode Register In this mode, once a service is registered, a TaskDefinition with action ‘INSERT’ will be sent to light-scheduler Kafka topic directly as the controller has Kafka access and also a Kafka streams application. Read More »

High Availability

Register and deregister We are using Kafka streams for the high availability implementation. It is recommended that we have 3 or 5 instances running so that workload can be migrated within them if one or two instances are not available. It is very easy to sync the register and deregister info between instances. Any node that receives a register request will send a ControllerRegisterEvent to the portal-event topic. Then all nodes will process the event with Kafka streams to update it in memory projection. Read More »