Spring Cloud Microservice Full-Stack Practice

1. Overview and Positioning

1.1 Background and Challenges of Microservices

Microservices break down business logic into multiple autonomous mini-services that evolve as independent deployment units. While this brings decoupling to teams and improved delivery efficiency, it also introduces runtime governance challenges: service discovery, configuration distribution, interface compatibility, fault tolerance and resilience, observability and capacity planning, operations and security compliance, etc. Without systematic governance capabilities, the cost of microservices will rise rapidly.

1.2 Spring Cloud Ecosystem and Version Matrix

Spring Cloud manages versions using a “Release Train” approach and has strict compatibility with Spring Boot. A common modern combination is Spring Boot 3.x with Spring Cloud 2022.x/2023.x. The evolution path of Netflix series requires attention:

Ribbon and Hystrix have been retired and replaced by Spring Cloud LoadBalancer and Resilience4j.
Zuul recommends using Spring Cloud Gateway.
Sleuth is migrated to Micrometer Tracing (or OpenTelemetry).

1.3 Microservice Capability Overview

From a runtime governance perspective, core capabilities include:

Service registration and discovery: Location transparency is achieved through stable service names.
Configuration Center: Centralized configuration and dynamic refresh, reducing manual synchronization and environmental differences.
Service Communication and Gateway: Unified entry point governance, authentication, and routing to improve consistency and security.
Fault tolerance and flow control: circuit breaking, degradation, retry, and bulkhead isolation to avoid cascading failures.
Observability: Health checks, indicators, link tracing, and log correlation are used to build a monitoring and early warning system.
Data consistency: Cross-service transactions and events drive the process to ensure business correctness.

2. Service Registration and Discovery

2.1 Core Concepts and Terminology

Service Registry: A database that stores service instances and their network locations (host/port).
Instance leases and heartbeats: Clients periodically report their liveness status, and the registry maintains the leases; expired instances will be removed.
Discovery and load balancing: The caller retrieves a list of instances from the registry based on the service name and selects a load balancing strategy.

2.2 Component Comparison: Eureka / Consul / Nacos

Eureka: High availability and eventual consistency, Java-friendly ecosystem, easy deployment and integration.
Consul: Strong consistency, built-in key-value pairs and health checks, good cross-language ecosystem, suitable for multi-stack teams.
Nacos integrates service discovery and configuration center, is cloud-native friendly, has an active domestic community, and covers a wide range of functions.

Selection Strategy: Prioritize team’s technology stack, governance habits, and infrastructure. If “unified service discovery + configuration” is required, Nacos is the preferred choice; for cross-language compatibility and consistency guarantees, Consul can be considered; for pure Java and easy deployment, Eureka is an option.

2.3 Quick Start Guide: Setting up a Eureka Server

// RegistryApplication.java
@SpringBootApplication
@EnableEurekaServer
public class RegistryApplication {
  public static void main(String[] args) {
    SpringApplication.run(RegistryApplication.class, args);
  }
}

# application.yml (Eureka Server)
server:
  port: 8761
eureka:
  client:
    register-with-eureka: false
    fetch-registry: false

2.4 Client Registration and Load Balancing

// OrderServiceApplication.java
@SpringBootApplication
@EnableEurekaClient
public class OrderServiceApplication {
  public static void main(String[] args) {
    SpringApplication.run(OrderServiceApplication.class, args);
  }
}

2.4.1 OpenFeign Inter-Service Calls

@FeignClient(name = "inventory-service")
public interface InventoryClient {
  @GetMapping("/inventory/{sku}")
  InventoryDto getBySku(@PathVariable String sku);
}

2.4.2 Spring Cloud LoadBalancer Strategy

LoadBalancer uses round-robin by default, but can be scaled with weights and custom selectors. It is recommended to expose stable service names and provide health probes, combined with instance tags, to achieve finer-grained routing.

2.4.3 Health Checks and Lease Management

The client needs to be enabled management.health; the registry center controls the instance lifecycle based on heartbeats and leases. Proper configuration can improve availability for scenarios with long-duration garbage collection or network leaseRenewalInterval fluctuations leaseExpirationDuration.

3. Configuration Center and Unified Configuration Governance

3.1 Config Server Architecture Design

Config Server pulls configurations from a Git repository and provides remote configurations by application name and profile. It ensures versioning and auditing of configurations, making it suitable for teams that emphasize unified control over code, configuration, and changes.

# application.yml (Config Server)
server:
  port: 8888
spring:
  cloud:
    config:
      server:
        git:
          uri: https://example.com/config-repo.git

3.2 Client Bootstrapping and Dynamic Refresh

The client loads the remote configuration during the bootstrap phase, enabling @RefreshScope configuration refresh without a restart:

# application.yml (client)
spring:
  application:
    name: order-service
  cloud:
    config:
      uri: http://localhost:8888
management:
  endpoints:
    web:
      exposure:
        include: refresh,health,info

@RefreshScope
@RestController
public class ConfigController {
  @Value("${order.maxParallel:8}")
  private int maxParallel;
  @GetMapping("/cfg/maxParallel")
  public int maxParallel() { return maxParallel; }
}

3.3 Access Control, Security and Key Management

The principle of least privilege: Config Server implements tiered read permissions for different environments; read-only deployment key management.
Sensitive information desensitization: Store keys using encryption or external key management (Vault/KMS); avoid plaintext appearing in the repository.
Access control and auditing: All configuration changes must be documented in terms of their origin and approval process.

3.4 Trade-offs with Nacos Configuration Center

Nacos integrates service discovery and configuration, reducing component complexity; Config Server emphasizes Git versioning and auditing. If your team prioritizes change rollback and code-based management, Config Server is the preferred choice; if you desire rapid integration and unified platform governance, Nacos is the better option.

4. Service Communication and API Gateway

4.1 Communication Mode: Synchronous HTTP / Asynchronous Messages

Synchronous calls: timely response, suitable for highly interactive scenarios, but coupled with time and resources; require timeout and fault tolerance strategies.
Asynchronous messages: decoupling and peak shaving, suitable for event-driven systems; require handling of idempotency, ordering, and retry strategies.

4.2 OpenFeign Design Specifications and Contracts

Contract stability: Avoid leaking internal models by using dedicated DTOs; manage evolution through versioned paths or header information.
Error handling: Unified exception packaging and error codes; timeout and retry strategies are configured according to downstream features.
Security and Auditing: TraceId is uniformly carried at the calling layer, and authentication and signature verification are performed.

@FeignClient(name = "payment-service")
public interface PaymentClient {
  @PostMapping("/payments")
  PaymentResp pay(@RequestBody PaymentReq req);
}

4.3 Spring Cloud Gateway Routing

spring:
  cloud:
    gateway:
      routes:
        - id: order_route
          uri: lb://order-service
          predicates:
            - Path=/api/orders/**
          filters:
            - StripPrefix=1

4.3.1 Predicates and Filters

Predicates determine whether a route is hit; filters process requests/responses. Common filters include path rewriting, header manipulation, stripprefix, ratelimiter, etc.

4.3.2 Authentication, Rate Limiting, and Gray Scale

Authentication: Unified login status verification and token verification, with secondary verification for sensitive interfaces.
Rate limiting: Token bucket rate limiting is implemented based on user/IP/Key and other dimensions to prevent overload.
Canary release: Canary releases and multiple versions are achieved through registration discovery or gateway weight distribution.

4.3.3 Unified Log and Observation Points

Inject unified logging and tracing context at the gateway layer to ensure that all requests entering the backend service have a TraceId and key business tags, which facilitates cross-service location and statistics.

5. Stability Engineering: Fault Tolerance and Current Limiting

5.1 Resilience4j Core Mode

Circuit Breaker: Protects the system by rapidly failing when the failure rate increases; detects and recovers from a half-open state.
RateLimiter limits the number of calls per unit of time to prevent overload.
Bulkhead isolation: Isolates different calls using thread pools or concurrency thresholds to prevent cascading failures.
Retry: Allows for limited retries in case of transient failure, combined with a backoff strategy; avoids using retries for non-idempotent operations.

5.2 Circuit Breakers and Degradation Strategies

@Service
public class PricingService {
  private final InventoryClient inventory;
  public PricingService(InventoryClient inventory) { this.inventory = inventory; }

  @CircuitBreaker(name = "inventory", fallbackMethod = "fallback")
  public Price calc(String sku) { return inventory.getBySku(sku).toPrice(); }

  public Price fallback(String sku, Throwable t) { return Price.zero(); }
}

5.3 Isolation, Retry, and Timeout

Isolation: Set independent thread pools or concurrency thresholds based on the importance of calls; avoid “chain crashes” caused by shared resources.
Retry: The maximum number of retries and the backoff time need to be coordinated with the downstream SLA; write operations need to be idempotent.
Timeout: The outer timeout should be slightly less than the downstream SLA to avoid “slow blocking”.

resilience4j:
  circuitbreaker:
    instances:
      inventory:
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 50
        failureRateThreshold: 50
        waitDurationInOpenState: 30s
  retry:
    instances:
      inventory:
        maxAttempts: 3
        waitDuration: 200ms

5.4 Strategy Orchestration and Best Practices

Recommended sequence: limit flow/bulge first, then retry and disconnect circuit; reduce amplified flow and cascading failures.
Granularity recommendation: Set independent strategies based on external dependencies (database, cache, third-party interfaces).
Observation closed loop: Expose indicators and events for each strategy to facilitate alerts and parameter tuning.

6. Observability and Operations

6.1 Actuator Health Check

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,env,loggers
  endpoint:
    health:
      show-details: always

Health checks need to cover databases, caches, message brokers, external dependencies, etc.; different “impact surfaces” should be set according to the environment and role (gateway/core service).

6.2 Micrometer Indicator System

Micrometer provides a unified metrics package for JVM, threads, HTTP, databases, and custom business logic. When integrated with Prometheus, it enables visualization and alerting via Grafana.

Key metrics: error rate, P95/P99 latency, thread pool activity, queue length, rejection count, GC pauses, slow database queries, and connection pool utilization.
Alarm thresholds: Set dynamic thresholds based on historical baselines and business SLAs to avoid “alarm storms”.

6.3 Distributed Tracking: Micrometer Tracing / OpenTelemetry

In Spring Boot 3.x, Micrometer Tracing is recommended (with Brave or OpenTelemetry as the underlying technology). Through automated context propagation and injection of TraceId/SpanId, cross-service call chains can be quickly located, forming a comprehensive observation system by combining logs and metrics.

6.4 Log Correlation, Alarms and Capacity Planning

Log association: Using TraceId as the association key, core fields (tenant, user, order number, etc.) are output uniformly for easy retrieval and statistics.
Alarms: Event-driven alarms (threshold, rate, ratio) are integrated with work orders for closed-loop troubleshooting.
Capacity planning: Assess resources based on daily/weekly peak loads and cache hit rate; set up elastic scaling strategies and preheating processes.

7. Data Consistency and Engineering Implementation

7.1 Domain Boundaries and Microservice Partitioning

Identify aggregate and context boundaries using Domain-Driven Design (DDD) to avoid using database tables or technical components as the basis for service segmentation. Clear boundaries contribute to stable interface contracts and team collaboration.

7.2 Transaction Modes and Eventual Consistency

Cross-service transactions typically use the Saga or Outbox pattern:

Saga: Long transactions are broken down into local transactions and compensation actions, implemented in either orchestration or event-driven ways.
Outbox: Writes to the database and writes messages are committed within the same database transaction and asynchronously delivered to the message bus, ensuring that events are not lost.

Idempotent keys, idempotent tables, and deduplication mechanisms are the infrastructure for ensuring consistency.

7.3 API Contracts and Testing Strategies

Contract testing: Protects consumers when the interface changes; provides mock and contract validation pipelines.
The testing pyramid: unit testing (rapid feedback) → contract and integration (interface correctness) → end-to-end (business path) → chaos engineering (stability and resilience).

7.4 CI/CD and Release Strategy

Zero-downtime release: a combination of blue-green and canary architectures; database changes employ dual-write or backward compatibility strategies.
Environment injection: Configuration and keys are injected according to the environment; avoiding hardcoding and environment drift.
Rollback and Version: Preset rollback schemes and version compatibility strategies to reduce risks.

8. Advanced Architecture and Cloud-Native Evolution

8.1 Service Mesh and Data Plane Sinking

Service meshes (such as Istio) offload traffic governance, observability, and encryption capabilities to the data plane (Sidecar), allowing the application layer to focus on business logic. For existing Spring Cloud architectures, a smooth migration is possible: gateways and some policies are retained, while traffic governance is gradually offloaded.

8.2 Event-Driven and Message Middleware

Event-centric systems are easier to scale and decouple. Message middleware (Kafka/RabbitMQ/RocketMQ) carries the event stream and subscriber model. Key design considerations include idempotency, sequential semantics, retry and dead-letter queues, and event versioning strategies.

8.3 Multi-tenancy and Isolation Strategies

Logical isolation: Tenant identification runs through the call chain and storage layer.
Resource isolation: independent thread pools, connection pools, and rate limiting quotas.
Data isolation: database/table/row level policies and encryption compliance.

8.4 Cost and Capacity Governance

Cost visualization: Costs are aggregated by service and resource dimensions; low-cost-performance hotspots are optimized.
Capacity model: Capacity planning is performed based on flow forecasting and stress testing results; elasticity strategies and cold start warm-up are set.

9. Summary and Extension (General)

9.1 Review and Expansion of Key Knowledge Points

This article establishes a knowledge framework for Spring Cloud microservice governance from a “general-specific-general” perspective: service registration and discovery, configuration center, communication and gateway, stability engineering, observability, data consistency and engineering implementation, as well as the advanced evolution of cloud-native technologies. In practice, it is recommended to base governance on domain-driven and contract-based stability, enhance resilience through policy orchestration and observation loops, and continuously manage capacity and cost.

The expansion directions include introducing service mesh to address traffic governance, adopting an event-driven architecture to improve scalability, upgrading configuration and key management to enhance compliance capabilities, and improving the organization’s stability engineering capabilities through chaos engineering and fault drills.

9.2 Further Reading and Resources (To Increase Exposure/Readership)

We encourage you to combine these materials with further reading and follow my subsequent advanced articles (service mesh implementation, event-driven architecture, chaos engineering practices, etc.) to improve your overall governance capabilities and engineering quality.

Spring Cloud Doc
Spring Boot Doc
OpenFeign Doc
Spring Cloud Gateway
Nacos Doc