LIGHT

  • News
  • Docs
  • Community
  • Reddit
  • GitHub

Health Check

Once a service is registered to the controller in the service post endpoint, the controller will allocate a task to send health check requests to the service instance periodically. Depending on the running mode, the health check task will be executed differently.

Demo Mode

Cluster Mode

Register

In this mode, once a service is registered, a TaskDefinition with action ‘INSERT’ will be sent to light-scheduler Kafka topic directly as the controller has Kafka access and also a Kafka streams application.

The light-schedule streams will process the task definition and push a task execution command to the target topic specified in the task definition.

We have a health check streams application in the controller to process the task execution commands in the topic and execute the health check tasks in separate threads. The execution commands are in different Kafka partitions; they can be easily scaled across multiple controller instances. If one instance is down, the partition will be taken by another instance, and the health check commands in that partition will be taken care of by another instance.

De-register

Once the registered service shuts down, it will invoke the service delete endpoint. This handle will send the same task definition with action ‘DELETE’ to the light-scheduler topic. The light-scheduler will process it to stop the task scheduling.

Streams Processor

In the process method, we need to cast the value to the TaskDefinition object. Otherwise, we will log an error and ignore the message if the type is not correct.

  • When to skip the TaskDefinition command?

When a service is de-register itself, the ServiceDeleteHandler sends a scheduler TaskDefinition to the light-scheduler to stop the health check execution. However, the last TaskDefinition object will be sent as a health check command, and it doesn’t have any data object in it. It will cause NPE in the health check streams. To avoid the NPE, we skip the message by checking the action is not DELETE.

The other case that we need to skip the command is the message start timestamp is too old. First, we calculate the grace period to be less than two times of frequency. If the start time is older than the grace period from the current time, the streams will ignore the command.

    long gracePeriod = TimeUtil.oneTimeUnitMillisecond(TimeUnit.valueOf(taskDefinition.getFrequency().getTimeUnit().name())) * taskDefinition.getFrequency().getTime() * 2;
    if(logger.isTraceEnabled()) logger.trace("current = " + System.currentTimeMillis() + " task start = " + taskDefinition.getStart() + " gracePeriod = " + gracePeriod);
    if(DefinitionAction.DELETE != taskDefinition.getAction() && System.currentTimeMillis() - taskDefinition.getStart() < gracePeriod) {

  • How do we know if HTTP check or TLS check is used?

When a TaskDefinition is received from the light-scheduler, it has a data section that contains detailed information for the health check.

If the healthPath is not empty, then it is an HTTP check. Otherwise, the health check is TTL.

  • How the service removed from registry and health check scheduling is stopped?

If an HTTP or TTL check is failed, the streams will invoke the removeNode method to check if we should remove the registry and stop the health check scheduling.


    if(System.currentTimeMillis() - Long.valueOf(healthMap.get("deregisterCriticalServiceAfter")) > Long.valueOf(healthMap.get("lastFailedTimestamp"))) {

  • About Light
    • Overview
    • Testimonials
    • What is Light
    • Features
    • Principles
    • Benefits
    • Roadmap
    • Community
    • Articles
    • Videos
    • License
    • Why Light Platform
  • Getting Started
    • Get Started Overview
    • Environment
    • Light Codegen Tool
    • Light Rest 4j
    • Light Tram 4j
    • Light Graphql 4j
    • Light Hybrid 4j
    • Light Eventuate 4j
    • Light Oauth2
    • Light Portal Service
    • Light Proxy Server
    • Light Router Server
    • Light Config Server
    • Light Saga 4j
    • Light Session 4j
    • Webserver
    • Websocket
    • Spring Boot Servlet
  • Architecture
    • Architecture Overview
    • API Category
    • API Gateway
    • Architecture Patterns
    • CQRS
    • Eco System
    • Event Sourcing
    • Fail Fast vs Fail Slow
    • Integration Patterns
    • JavaEE declining
    • Key Distribution
    • Microservices Architecture
    • Microservices Monitoring
    • Microservices Security
    • Microservices Traceability
    • Modular Monolith
    • Platform Ecosystem
    • Plugin Architecture
    • Scalability and Performance
    • Serverless
    • Service Collaboration
    • Service Mesh
    • SOA
    • Spring is bloated
    • Stages of API Adoption
    • Transaction Management
    • Microservices Cross-cutting Concerns Options
    • Service Mesh Plus
    • Service Discovery
  • Design
    • Design Overview
    • Design First vs Code First
    • Desgin Pattern
    • Service Evolution
    • Consumer Contract and Consumer Driven Contract
    • Handling Partial Failure
    • Idempotency
    • Server Life Cycle
    • Environment Segregation
    • Database
    • Decomposition Patterns
    • Http2
    • Test Driven
    • Multi-Tenancy
    • Why check token expiration
    • WebServices to Microservices
  • Cross-Cutting Concerns
    • Concerns Overview
  • API Styles
    • Light-4j for absolute performance
    • Style Overview
    • Distributed session on IMDG
    • Hybrid Serverless Modularized Monolithic
    • Kafka - Event Sourcing and CQRS
    • REST - Representational state transfer
    • Web Server with Light
    • Websocket with Light
    • Spring Boot Integration
    • Single Page Application
    • GraphQL - A query language for your API
    • Light IBM MQ
    • Light AWS Lambda
    • Chaos Monkey
  • Infrastructure Services
    • Service Overview
    • Light Proxy
    • Light Mesh
    • Light Router
    • Light Portal
    • Messaging Infrastructure
    • Centralized Logging
    • COVID-19
    • Light OAuth2
    • Metrics and Alerts
    • Config Server
    • Tokenization
    • Light Controller
  • Tool Chain
    • Tool Chain Overview
  • Utility Library
  • Service Consumer
    • Service Consumer
  • Development
    • Development Overview
  • Deployment
    • Deployment Overview
    • Frontend Backend
    • Linux Service
    • Windows Service
    • Install Eventuate on Windows
    • Secure API
    • Client vs light-router
    • Memory Limit
    • Deploy to Kubernetes
  • Benchmark
    • Benchmark Overview
  • Tutorial
    • Tutorial Overview
  • Troubleshooting
    • Troubleshoot
  • FAQ
    • FAQ Overview
  • Milestones
  • Contribute
    • Contribute to Light
    • Development
    • Documentation
    • Example
    • Tutorial
“Health Check” was last updated: October 12, 2021: fixes #301 add docs for light-scheduler and light-controller (31f7a7e)
Improve this page
  • News
  • Docs
  • Community
  • Reddit
  • GitHub
  • About Light
  • Getting Started
  • Architecture
  • Design
  • Cross-Cutting Concerns
  • API Styles
  • Infrastructure Services
  • Tool Chain
  • Utility Library
  • Service Consumer
  • Development
  • Deployment
  • Benchmark
  • Tutorial
  • Troubleshooting
  • FAQ
  • Milestones
  • Contribute