Fundamentals & Trade Offs
What are the primary trade-offs when choosing between a monolith and microservices?
What are the primary characteristics of a microservices architecture compared to a monolithic one?
How does independent scaling of individual services work, and what advantage does it give over scaling a monolith?
When would you recommend against using microservices for a new project?
What is the fundamental difference between Service-Oriented Architecture (SOA) and the modern microservices architectural style?
What are the primary drivers for moving from a monolith to microservices, and conversely, when is a monolith the better choice for a team?
How do microservices differ from a serverless (Function-as-a-Service) architecture, and when would you choose one over the other?
What does the 'you build it, you run it' ownership model mean, and how does it change how teams operate microservices?
Why is debugging a distributed microservices system fundamentally harder than debugging a monolith?
What is the 'operational tax' of microservices, and what added complexities arise in testing, deployment, and debugging compared to a monolithic system?
What are the fallacies of distributed computing, and how do they apply to microservices?
Service Decomposition & Boundaries
In the context of Domain-Driven Design, what is a 'Bounded Context' and how does it relate to microservice boundaries?
Explain Conway's Law and how it influences the design and success of a microservices architecture.
What are the trade-offs of sharing code between services via a shared library versus duplicating it?
How do you decide whether a piece of functionality should be a new microservice or added to an existing one?
How do you determine the right size for a service, and what are the dangers of making a service too small (nanoservices)?
How do you decide where to draw the boundaries between services, and what is the difference between decomposing by business capability vs. by sub-domain?
Explain the relationship between high cohesion and loose coupling in the context of microservices, and how do you measure them?
What is the difference between decomposing by business capability versus decomposing by sub-domain?
How do you handle shared logic or common code across multiple services without creating tight coupling?
What is an Anti-Corruption Layer in Domain-Driven Design, and when would you use one between services?
Inter Service Communication
What are the trade-offs between synchronous (REST/gRPC) and asynchronous (message-driven) communication, and when would you choose one over the other?
When would you choose gRPC over REST for internal microservices communication?
Explain the publish/subscribe pattern and how it enables loose coupling between microservices.
What is the difference between point-to-point messaging and a broker-based/pub-sub model in inter-service communication?
How is load balancing performed across multiple instances of a service, and how does it interact with service discovery?
Explain the difference between service orchestration and service choreography, and which is more scalable and why.
Api Gateway & Service Discovery
What is the role of an API Gateway, and how does it differ from a load balancer?
How does containerization fit the microservices model, and why is it such a natural pairing?
What is the role of a container orchestrator in running microservices, conceptually?
What is the "Backend-for-Frontend" (BFF) pattern, and what problem does it solve for mobile vs. web clients?
Explain the "Sidecar" pattern and how it is used to offload cross-cutting concerns.
What is "Service Discovery," and why is it needed in dynamic, cloud-native environments?
How do microservices find and communicate with each other in a dynamic environment, and what is the difference between client-side and server-side discovery?
What is the role of an API Gateway, and how does the 'Backend for Frontend' (BFF) pattern differ from a generic gateway?
What is the difference between an API Gateway aggregating requests and a service directly calling multiple downstream services?
What is a service registry, and how does registration and health-based deregistration work?
What is a Service Mesh (including the concept of a sidecar), and how does its purpose differ from an API Gateway?
Resilience & Fault Tolerance
What is the difference between a retry and a fallback strategy?
Explain the 'Circuit Breaker' pattern — what are its three states and why is it used?
What does it mean for a service to be "Idempotent," and why is this critical in an event-driven microservices system?
What is 'Graceful Degradation' and can you give an example in a microservices context?
How do you implement "Retry with Exponential Backoff and Jitter," and why is jitter important?
What is rate limiting and throttling, and why is it an important protective pattern in a microservices system?
Why is setting appropriate timeouts critical in inter-service calls, and what happens if you rely on default timeouts?
Explain the "Bulkhead" pattern and how it prevents a single service failure from taking down the entire system.
What is cascading failure, and how do you prevent it in a distributed system?
What is 'Backpressure' and why is it important in an event-driven microservices system?
How do you prevent a 'Retry Storm' when a downstream service is struggling?
Explain the difference between a 'Retry with Backoff' and a 'Bulkhead' pattern, and when you would use one over the other.
How do you prevent a single slow downstream service from taking down your entire system?
Data Management & Consistency
What is polyglot persistence, and what are its advantages and disadvantages?
How do you explain eventual consistency to a business stakeholder who expects immediate data updates?
Explain the concept of eventual consistency. In what business scenarios is it unacceptable?
What is Command Query Responsibility Segregation (CQRS), and in what scenarios does it become necessary in a microservices environment?
How do you query or join data spread across three different microservices with three different databases, comparing API Composition versus CQRS?
Why is the 'database-per-service' pattern recommended, and what are the challenges when you need to join or query across data owned by different services?
What is event sourcing, and how does it relate to microservices and CQRS?
How do you keep duplicated data in sync across services when each service owns its own copy?
What is the API Composition pattern, and what are its limitations compared to CQRS for cross-service queries?
How would you approach caching within a microservices architecture, and what are the pitfalls of shared cache state?
Distributed Transactions & Messaging Patterns
What is a compensating transaction and how does it differ from a traditional database rollback?
Why is two-phase commit (2PC) generally avoided in microservices, and what problems does it introduce?
What is the 'Transactional Outbox' pattern, and how does it solve the problem of atomically updating a database and sending a message to a broker?
Explain the difference between Orchestration and Choreography in a Saga and the trade-offs of each.
Explain how the Saga pattern manages distributed transactions and how you handle a failure in the middle of a multi-service workflow with compensating transactions.
How do you handle distributed transactions across multiple services without using two-phase commit?
How do you handle "Eventual Consistency" in a system where a user expects immediate feedback?
What is the inbox pattern, and how does it complement the outbox pattern for reliable message processing?
How do you evolve event schemas over time without breaking downstream consumers in an event-driven system?
How does the CAP theorem force design decisions like eventual consistency and compensating transactions in microservices?
Observability & Monitoring
How do you track a single user request as it travels through ten different microservices, and what is a Correlation ID?
Why is centralized logging more important in microservices than in a monolith?
What are health checks, and how do they support self-healing in an orchestrated microservices environment?
What is the difference between a liveness check and a readiness check, and why does an orchestrator need both?
What are the "Three Pillars of Observability" (Metrics, Logs, Traces) and how do they apply to microservices?
What is the difference between "Log Aggregation" and "Distributed Tracing"?
Why are logs alone insufficient in microservices, and what are the roles of metrics and distributed tracing in maintaining system health?
What metrics would you monitor to understand the health and performance of a microservices system?
Security
How do you handle authentication and authorization across services, and how is a JWT typically propagated from the gateway to downstream services?
What is mTLS (Mutual TLS), and why is it used for service-to-service communication?
What is the zero trust security model in the context of microservices?
How do you manage secret management (API keys, DB credentials) in a system with hundreds of services?
Why does a microservices architecture increase the attack surface, and how do you mitigate the added security risk?
Deployment & Configuration
What is the strangler fig pattern, and how is it used to migrate a monolith to microservices?
Why is the shared database considered an anti-pattern in microservices, and what are the risks of ignoring this rule?
What is the difference between Blue-Green deployment and Canary deployment in a microservices context?
What does it mean for a service to be 'independently deployable,' and what happens to the architecture if this requirement is violated?
How do you manage externalized configuration across many services, and what are the trade-offs of centralized versus per-service configuration?
What role do feature flags/toggles play in safely deploying and releasing microservices?
What is a 'distributed monolith,' how does it happen, and why is it considered an anti-pattern?
How do you handle breaking changes in a service's API when multiple other services depend on it, using strategies like semantic versioning or parallel versioning?
Which of the 12 factors are most critical for ensuring a microservice is truly cloud-native and independently deployable?
Testing
What is consumer-driven contract (CDC) testing, and why is it preferred over traditional integration testing for microservices?
What does the testing pyramid look like for microservices, and why do end-to-end tests become problematic at scale?
How do integration and end-to-end testing challenges differ in microservices compared to a monolith?