Case StudiesPayment GatewayPart 1
Part 115 min read

Designing a Highly Available and Resilient Payment Gateway

A Functional Overview

8 engineers
Financial Services
Advanced Level

Executive Summary

In a digital-first world where commerce spans across physical and online channels, building a highly available, scalable, and resilient payment gateway is mission-critical. This case study examines the architecture, components, and operational workflows of a contemporary omnichannel payment gateway that facilitates both Point-of-Sale (PoS) and Unified Payments Interface (UPI) transactions, while ensuring high uptime and security.

?

What Is a Payment Gateway?

A payment gateway is a technology platform that facilitates payment transactions between customers, merchants, and financial institutions. It enables merchants to accept various payment methods like cards, bank transfers, mobile wallets, and UPI.

The gateway:

  • Authenticates payment requests
  • Routes them to the correct financial institution
  • Returns an authorization or failure response

Business Context & Design Goals

Modern commerce requires a payment infrastructure that can handle multi-channel payments, support various payment methods, ensure regulatory compliance, and provide exceptional reliability at massive scale.

High Availability

99.99% uptime with multi-region failover and circuit breakers

99.99%
Uptime SLA

Resilience

Graceful degradation with retry mechanisms and fallback routing

<100ms
Response Time

Scalability

Horizontal scaling with auto-scaling and load balancing

10K+
TPS Peak

Security

PCI-DSS Level 1 compliance with tokenization and encryption

0.01%
Fraud Rate

Solution Architecture Overview

High-Level Payment Gateway Architecture

PoS Terminal
Mobile App
Web App

Multi-channel payment initiation

Edge Layer

CDN, Load Balancers, API Gateway

Processing Layer

Microservices, Orchestration, Business Logic

Integration Layer

Banking APIs, Card Networks, UPI PSPs

Payment Processing Flow

Customer RequestAuthenticationRisk AnalysisPayment RoutingResponse

Channel Support

  • Point-of-Sale (PoS) terminals
  • Mobile applications
  • E-commerce websites
  • API integrations

Payment Methods

  • Credit/Debit Cards
  • UPI Payments
  • Digital Wallets
  • Bank Transfers

Key Features

  • PCI-DSS Compliance
  • Fraud Detection
  • Automatic Failover
  • Real-time Analytics

Core Functional Components

System Architecture Overview

The payment gateway consists of six interconnected services, each responsible for specific aspects of payment processing, working together to ensure secure, reliable transactions.

6
Core Services
2
Payment Methods
Multi
Channels
Global
Scale

1. Edge/API Layer

Entry point for all payment requests

  • Load balancing across multiple regions
  • SSL/TLS termination and certificate management
  • Rate limiting and DDoS protection
  • Request routing and protocol translation

2. Authentication & Authorization

Secure access control for all requests

  • Merchant authentication via API keys/OAuth
  • mTLS for PoS device validation
  • JWT token generation and validation
  • Role-based access control (RBAC)

3. Risk & Fraud Engine

Real-time fraud detection and prevention

  • Machine learning-based fraud scoring
  • Rule-based risk assessment
  • Velocity checks and anomaly detection
  • 3D Secure integration for card payments

4. Payment Orchestrator

Smart routing and fallback management

  • Intelligent routing based on success rates
  • Automatic failover to backup acquirers
  • Load balancing across payment processors
  • Real-time performance monitoring

5. Tokenization Service

Secure storage and management of payment data

  • Replaces sensitive card numbers (PAN) with secure tokens
  • PCI-DSS compliant data storage
  • Token lifecycle management
  • Encryption at rest and in transit

6. UPI Orchestrator

UPI-specific payment processing

  • Handles UPI-specific flows like QR generation
  • Integrates with NPCI and UPI PSPs
  • Manages UPI intent creation and response handling
  • Real-time status updates and notifications

Real-World Transaction Examples

Understanding Payment Processing

Let's examine how the system handles real-world scenarios, demonstrating resilience, intelligent routing, and seamless user experience across different payment methods.

VISA Debit Card Payment with Fallback

Demonstrating system resilience and intelligent routing

Scenario Context

Transaction Details
  • Location: Retail store in London
  • Amount: £45.99 GBP
  • Card: Visa Debit (BIN: 4532****1234)
  • Method: NFC tap payment
Challenge
  • Primary Acquirer: FIS (experiencing issues)
  • Fallback Acquirer: Barclays
  • Expected Outcome: Seamless transaction completion
Step 1

PoS Terminal Processing

Actions Performed
  • • Reads NFC data from debit card
  • • Extracts PAN, expiry date, CVV
  • • Generates unique transaction ID
  • • Initiates secure HTTPS request
Sample Payload
{
"amount": "4599",
"currency": "GBP",
"card_number": "4532****1234",
"merchant_id": "MER123"
}
Step 2

Authentication & Security

Security Validation
  • • mTLS certificate verification
  • • Merchant authentication (API keys)
  • • Request schema validation
  • • Rate limiting enforcement
Tokenization
  • • PAN replaced with secure token
  • • HSM-backed encryption
  • • PCI-DSS compliance maintained
  • • Token: TOK_4532_ABC123
Step 3

Risk Assessment

ML Analysis
  • • Fraud score: 0.15 (low risk)
  • • Geolocation verification
  • • Device fingerprinting
  • • Behavioral analysis
Rule Engine
  • • Velocity check: 3 transactions/hour
  • • Amount threshold validation
  • • Merchant category verification
  • • Decision: APPROVE
Step 4

Intelligent Routing & Fallback

Primary Attempt (FIS)
  • • Route to FIS based on BIN routing
  • • ISO 8583 message construction
  • • 3-second timeout threshold
  • • Result: TIMEOUT (system degraded)
Fallback Execution (Barclays)
  • • Automatic circuit breaker activation
  • • Seamless routing to Barclays
  • • Response time: 333ms
  • • Result: APPROVED (Auth: 123ABC)
Step 5

Transaction Completion

Success Response
  • • Authorization code: 123ABC
  • • Total processing time: 1.2 seconds
  • • Response status: APPROVED
  • • Receipt printed successfully
Post-Processing
  • • Settlement file generation
  • • Merchant notification sent
  • • Audit trail logged
  • • Analytics data captured

Key Insights

Resilience Demonstrated
  • • Automatic failover without manual intervention
  • • Circuit breaker prevented cascade failures
  • • Customer experience remained seamless
Performance Metrics
  • • Total time: 1.2 seconds (well under 5s SLA)
  • • Fallback latency: 333ms
  • • Success rate maintained: 99.9%

UPI QR Code Payment

Demonstrating real-time UPI integration and callback handling

Scenario Context

Transaction Details
  • Location: Coffee shop in Mumbai
  • Amount: ₹299 INR
  • Method: UPI QR Code scan
  • App: Google Pay
UPI Integration
  • UPI PSP: Razorpay
  • VPA: customer@okaxis
  • Expected Outcome: Instant payment confirmation
Step 1

QR Code Generation

Merchant Request
  • • Cashier initiates ₹299 transaction
  • • UPI Orchestrator contacted
  • • Dynamic QR code generation
  • • 5-minute expiry timer set
QR Code Content
upi://pay?pa=merchant@razorpay
&pn=Coffee+Shop
&am=299.00
&cu=INR
&tr=TXN20240115001
Step 2

Customer Interaction

QR Code Scan
  • • Customer opens Google Pay app
  • • Scans QR code with phone camera
  • • Transaction details parsed
  • • Payment confirmation screen shown
UPI Authentication
  • • Customer confirms payment
  • • UPI PIN entered
  • • Biometric authentication
  • • NPCI payment request initiated
Step 3

NPCI Processing

Payment Processing
  • • NPCI validates transaction
  • • Debit request to customer's bank
  • • Credit request to merchant's bank
  • • Settlement initiated
Real-time Updates
  • • Status: PENDING → PROCESSING
  • • Webhook callback to our system
  • • Customer notification sent
  • • Merchant POS updated
Step 4

Success Confirmation

Payment Completion
  • • Transaction reference: UPI123456789
  • • Total processing time: 800ms
  • • Status: SUCCESS
  • • Receipt generated
Notifications
  • • Customer SMS confirmation
  • • Merchant dashboard updated
  • • Analytics data captured
  • • Settlement scheduled

System Capabilities Summary

Technical Excellence

Multi-Channel Processing

Seamless handling of PoS, web, and mobile transactions

Intelligent Routing

Dynamic routing with automatic failover capabilities

Real-time Fraud Detection

ML-powered risk assessment and prevention

Business Impact

99.99% Uptime

Mission-critical availability with minimal downtime

Sub-second Response

Optimal performance under high-load conditions

PCI-DSS Compliance

Enterprise-grade security and regulatory compliance

Ready for Deep Dive?

This functional overview covered the business context, core components, and real-world transaction flows. Next, we'll explore the technical architecture, AWS infrastructure, and detailed implementation patterns.

Technical Implementation Deep Dive

Real-World Scenario: VISA Debit Card Payment with Fallback

📧
Scenario

A customer taps their Visa debit card on a PoS terminal at a retail store in London. The transaction amount is £25. The primary acquirer (e.g., FIS) is experiencing timeouts, so the transaction is routed to a fallback acquirer (e.g., Barclays) and successfully authorized.

1

🟢 PoS Terminal

  • • Reads NFC from debit card
  • • Constructs ISO8583 message:
  • - MTI: 0100 (authorization request)
  • - Field 2: PAN, Field 4: Amount
  • - Field 41: Terminal ID
  • • Encrypts sensitive data, signs message
  • • Sends to Gateway Edge Endpoint over mTLS TCP/IP
2

🟡 API Gateway / Edge Layer

  • • Validates schema and message type
  • • Authenticates merchant (API key/mTLS)
  • • Adds x-trace-id, logs initial metadata
  • • Forwards to Gateway Orchestrator
3

🟠 Gateway Orchestrator

  • • Parses ISO8583 into internal domain object
  • • Generates txn-id, correlation-id
  • • Stores txn in transient DB (Redis/DynamoDB with TTL)
  • • Sends PAN to Tokenization Service
4

🟣 Tokenization Service

  • • Converts PAN → token, stores mapping
  • • Returns token to Orchestrator
  • • PAN replaced with token for downstream processing
5

🔵 Fraud/Risk Engine

  • • Runs velocity, geolocation, MCC checks
  • • Executes machine learning fraud detection
  • • Score returned: safe
6

🟤 Transaction Router (Fallback Logic)

  • • Determines BIN: card is Visa Debit → FIS (primary) and Barclays (fallback)
  • • Invokes FIS acquirer connector:
  • - Timeout occurs (retry x 2 fails)
  • • Triggers fallback logic:
  • - Invokes Barclays connector
  • - Barclays returns ISO8583 response 0110 with AuthCode 123456
7

⚪ Orchestrator Finalization

  • • Updates transaction status: Authorized
  • • Persists full record (Aurora or DynamoDB)
  • • Constructs ISO8583 0110 (auth response)
  • • Returns to PoS terminal
8

🟠 Notification & Events

  • • Publishes Kafka event: payment.authorized
  • • Triggers downstream flows (settlement engine, audit log)

Acquirer Integration Patterns

PatternDescription
ISO 8583Used by legacy acquirers (FIS, First Data, Barclays). Requires message packing/unpacking, MTIs, DEs.
REST/gRPC APIUsed by modern PSPs (Adyen, Stripe, Elavon). Faster integration, JSON/gRPC payloads.
File-based SettlementBatch files uploaded via SFTP for settlement (legacy but still common).
Message Queue IntegrationSome PSPs use MQ (IBM MQ, Rabbit) for async workflows.

ISO8583 Connector Implementation

Required Fields
MTI0100 (request), 0110 (response)
DE2PAN
DE3Processing Code
DE4Amount
DE7Date/Time
DE11STAN (System Trace Audit Number)
DE39Response Code
Fallback Strategy
  • • Routing Engine maintains active/standby acquirer mappings
  • • Circuit breaker opens if:
  • - Consecutive failures > threshold
  • - Latency exceeds SLA
  • • Fallback routes to next acquirer in priority list
  • • Retry with jittered exponential backoff

Multi-Region AWS Architecture

🎯 Objective

Ensure low latency, global scale, and zero-downtime failover.

🇪🇺 eu-west-1 (Ireland)

Primary for UK/Europe

🇺🇸 us-east-1 (Virginia)

Backup or active-active

🔒 Resilience Techniques
AreaStrategy
EKSMulti-AZ nodes, zonal balancing, HPA + VPA
DBAurora Global DB (failover in <60s), write forwarding
SecretsReplicated with AWS Secrets Manager
KafkaMSK multi-AZ with geo-replication (optional)
API GatewayRegional + global acceleration with health-check-based failover

PoS-Based Card Payment: Standard Flow

1

Point of Sale (PoS) Terminal

The customer taps their debit card

2

API Layer

Validates and authenticates requests

3

Orchestrator

Coordinates tokenisation and fraud checks

4

Router

Selects the acquirer. If primary fails, use fallback

5

Acquirer

Approves or declines

6

Orchestrator

Returns response to PoS

7

Post-processing

Event published for settlement, audit, etc.

UPI-Based Payment: End-to-End Flow

1

Customer Action

Selects UPI, enters VPA or scans QR

2

API Layer

Forwards to UPI orchestrator

3

UPI Orchestrator

Initiates UPI intent using PSP

4

NPCI/Bank PSP

Customer approves in their UPI app

5

Callback

PSP confirms success to the gateway

6

Orchestrator

Finalises and returns authorisation

7

Post-processing

Transaction logged, notified

Enhanced UPI Payment Flow: Technical Deep Dive

🎯
Scenario

Customer selects "Pay by UPI" on a mobile PoS or eCommerce site. They enter their UPI ID (e.g., user@okicici) or scan a QR code. UPI intent is triggered, and they approve the payment in their UPI app (e.g., GPay or PhonePe).

1

🟢 Frontend / PoS

  • • Customer selects UPI payment method
  • • Enters UPI ID or scans dynamic QR code
  • • Gateway receives JSON payload:
{
"amount": 2500,
"currency": "INR",
"upiId": "user@okicici",
"txnRef": "TXN123456"
}
2

🟡 Edge/API Gateway

  • • Validates JSON payload structure
  • • Applies rate limits and API key authentication
  • • Forwards to Gateway Orchestrator
3

🟠 Gateway Orchestrator

  • • Identifies payment type: UPI
  • • Routes to UPI-specific payment handler service
  • • Generates correlation ID for tracking
4

🔵 UPI Orchestrator

  • • Creates UPI transaction record
  • • Generates dynamic QR or intent URL using PSP API
  • • Calls registered UPI PSP (Razorpay, PayTM, Cashfree) via REST:
POST /upi/pay
{
"upiId": "user@okicici",
"amount": 2500,
"txnRef": "TXN123456",
"callbackUrl": "https://gateway.com/upi/status"
}
5

🟣 NPCI/Bank PSP

  • • NPCI routes payment request to payer bank
  • • Customer gets push notification in UPI app
  • • Customer approves → funds debited
  • • Status returned to PSP
6

🟤 Callback / Polling

  • • PSP sends success/failure callback to:
  • POST /upi/status
  • • Gateway updates transaction as authorized
  • • Triggers downstream settlement or order flow

Key Components Added for UPI Support

ComponentFunction
Payment Type ResolverClassifies card, UPI, wallet, etc., and routes internally.
UPI OrchestratorHandles UPI-specific APIs, QR generation, callback parsing.
UPI PSP IntegrationsREST APIs to Razorpay, Cashfree, PayTM, PineLabs (NPCI certified providers).
Callback HandlerSecure endpoint to receive status updates from PSPs.
Risk Engine HooksOptional velocity/duplicate checks on UPI ID.

Gateway Enhancements Required for UPI

LayerEnhancement
API ContractNew routes: /upi/initiate, /upi/status
Risk LayerRules for frequency on UPI ID, UPI handle blacklist, geo-fraud detection
NotificationSend webhook/email after UPI success
Event BusAdd upi.payment.success, upi.payment.failed Kafka topics
AuditStore payer VPA, PSP ID, txn ref, timestamp, amount, response code

Multi-Region Support for UPI

ChallengeSolution
NPCI/PSPs are India-region onlyUPI orchestrator deployed in ap-south-1 (Mumbai) or edge proxied
Callback locality issuesEdge-distributed API Gateway routes callback to correct region
Latency for PoS terminals abroadUse regional frontends → route only UPI flows to India-based backends
ResiliencyMulti-AZ EKS in Mumbai + health-based PSP fallback (Razorpay → PayTM)

Security Considerations for UPI

  • mTLS between gateway and PSPs
  • HMAC validation on PSP callbacks
  • Idempotency keys to prevent duplicate charges
  • Logging with masking for UPI ID

Monitoring & Metrics for UPI

UPI Success RateRatio of authorized to initiated
Average Approval TimeTime from UPI init to callback
PSP LatencyRazorpay vs PayTM API response
Callback DelayTime to get UPI confirmation

For Non-Technical Readers: Key Concepts Simplified

Payment Terms

Acquirer:A bank or processor that accepts card payments on behalf of merchants
Tokenisation:Replacing real card numbers with non-sensitive tokens for safety
UPI:India's mobile-first bank-to-bank payment system

System Terms

Orchestrator:Like a traffic cop, it decides which internal service does what
Fallback Routing:If one processor is down, the system automatically switches to another
Real-Time:Payments are approved/declined in seconds or less

What You'll Learn

  • Payment gateway fundamentals and architecture
  • PoS and UPI transaction flows
  • Event-driven choreography patterns
  • Security and compliance best practices

Key Technologies

AWSMicroservicesKubernetesPostgreSQLRedisAPI GatewayLambdaCloudFormation

Security & Compliance

Built with PCI DSS Level 1 compliance, end-to-end encryption, and comprehensive audit trails.

99.99%
Uptime
10M+
Transactions/mo
<100ms
Latency
PCI DSS
Level 1

Supported Payment Types

Debit/Credit Cards

Visa, MasterCard, RuPay

UPI Payments

PhonePe, GPay, Paytm

Net Banking

All major banks

Mobile Wallets

Digital wallets

Final Thoughts

A modern payment gateway must be secure, intelligent, and globally scalable, with support for traditional cards and modern methods like UPI. By separating responsibilities into orchestrated services and using cloud-native, fault-tolerant infrastructure, businesses can process payments reliably while innovating on features, fraud detection, and user experience.

Such a platform doesn't just move money, it powers trust in the digital economy.