Curated Data: Seeking Excelence

Exploring what is curated data and how it makes TickersData asdfas

By Engineering Team January 20, 2025
3 min read
#api #performance #architecture #fintech

Financial markets operate at microsecond precision, where a few milliseconds of latency can mean the difference between profit and loss. Building APIs that serve real-time market data requires careful attention to every layer of the stack, from network protocols to database optimization.

The Performance Imperative

When we started building TickersData, we knew that performance wasn’t optional—it was the foundation of everything we’d create. Traditional APIs simply couldn’t meet the demands of modern algorithmic trading and real-time analytics.

“In financial markets, latency is literally money. Every millisecond of delay costs our users potential profits.”

Key Performance Metrics

Our target performance benchmarks were ambitious but necessary:

  • Sub-5ms response times for real-time data
  • 99.99% uptime with automatic failover
  • 1M+ requests per second peak capacity
  • <100ms end-to-end data freshness

Architecture Overview

We designed our system around three core principles: speed, reliability, and scalability. Here’s how we structured the architecture:

Data Ingestion Layer

The first challenge was ingesting market data from multiple sources simultaneously:

interface DataSource {
  id: string;
  protocol: 'websocket' | 'tcp' | 'udp';
  latency: number;
  reliability: number;
}

class DataAggregator {
  private sources: Map<string, DataSource> = new Map();

  async ingest(data: MarketData): Promise<void> {
    // Parallel processing with circuit breakers
    const results = await Promise.allSettled(
      this.sources.values().map(source =>
        this.processSource(source, data)
      )
    );

    return this.reconcileResults(results);
  }
}

Caching Strategy

We implemented a multi-tier caching system:

  1. L1 Cache: In-memory Redis clusters
  2. L2 Cache: Distributed cache with 99.9% hit rate
  3. L3 Cache: Cold storage for historical data
Cache LevelLatencyCapacityUse Case
L1 (Redis)<1ms50GBReal-time quotes
L2 (Distributed)<5ms500GBRecent history
L3 (Cold)<100ms50TBHistorical analysis

API Gateway Design

Our gateway handles rate limiting, authentication, and request routing:

class APIGateway:
    def __init__(self):
        self.rate_limiter = TokenBucketLimiter()
        self.auth_service = JWTAuthService()
        self.circuit_breaker = CircuitBreaker()

    async def handle_request(self, request: Request) -> Response:
        # Fast path for authenticated requests
        if await self.auth_service.validate(request.headers.authorization):
            return await self.route_request(request)

        # Fallback authentication
        return await self.handle_unauthenticated(request)

Performance Optimizations

Memory Management

One of our biggest challenges was managing memory allocation in a high-frequency environment:

  • Zero-copy operations where possible
  • Object pooling for frequently allocated structures
  • Custom allocators for time-critical paths

Network Optimization

We optimized every aspect of network communication:

Protocol Selection

  • HTTP/2 for REST APIs with multiplexing
  • WebSockets for real-time streaming
  • gRPC for internal service communication

Connection Pooling

type ConnectionPool struct {
    connections chan *Connection
    maxSize     int
    activeConns int32
}

func (p *ConnectionPool) Get() (*Connection, error) {
    select {
    case conn := <-p.connections:
        return conn, nil
    default:
        if atomic.LoadInt32(&p.activeConns) < int32(p.maxSize) {
            return p.createConnection()
        }
        return nil, ErrPoolExhausted
    }
}

Database Architecture

Time-Series Optimization

Financial data is inherently time-series based, so we built our database layer around this:

-- Partitioned by time for optimal query performance
CREATE TABLE market_data (
    symbol VARCHAR(10) NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL,
    price DECIMAL(18,8) NOT NULL,
    volume BIGINT NOT NULL,
    -- Partition by day for fast time-range queries
) PARTITION BY RANGE (timestamp);

-- Indexes optimized for common query patterns
CREATE INDEX CONCURRENTLY idx_symbol_time
ON market_data (symbol, timestamp DESC);

Read Replicas and Sharding

We horizontally partition data across multiple database clusters:

  1. Hot data (last 24 hours): SSD-backed, high-memory instances
  2. Warm data (last 30 days): Balanced storage and compute
  3. Cold data (historical): High-capacity, cost-optimized storage

Monitoring and Observability

Real-Time Metrics

We track hundreds of metrics in real-time:

  • Request latency (p50, p95, p99)
  • Error rates by endpoint and customer
  • Throughput across all services
  • Data freshness from ingestion to API

Alerting Strategy

Our alerting system uses multiple escalation levels:

⚠️ Warning: Performance degradation detected 🚨 Critical: SLA breach imminent 🔥 Emergency: Customer-facing service down

Lessons Learned

What Worked Well

  1. Microservices architecture enabled independent scaling
  2. Immutable infrastructure reduced deployment risks
  3. Circuit breakers prevented cascade failures
  4. Comprehensive testing caught edge cases early

What We’d Do Differently

  1. Start with fewer services - we over-engineered initially
  2. Invest in tooling earlier - observability is crucial
  3. Focus on data quality from day one
  4. Plan for multi-region from the beginning

Future Improvements

Looking ahead, we’re working on several exciting enhancements:

Predictive Caching

Using machine learning to predict which data customers will request:

class PredictiveCache:
    def __init__(self):
        self.model = TimeSeriesPredictor()
        self.cache = DistributedCache()

    async def preload_predictions(self) -> None:
        predictions = await self.model.predict_next_hour()
        for symbol, probability in predictions:
            if probability > 0.8:  # High confidence
                await self.cache.warm(symbol)

Edge Computing

Deploying compute closer to our customers:

  • Regional data centers for reduced latency
  • Edge caching with smart invalidation
  • CDN integration for static content

Advanced Analytics

Real-time pattern detection and anomaly identification:

The ability to detect market anomalies in real-time opens up entirely new possibilities for our customers.

Conclusion

Building high-performance financial APIs requires attention to detail at every level of the stack. From choosing the right protocols to optimizing database queries, every decision impacts the end-user experience.

The key is to measure everything, optimize systematically, and never stop learning from both successes and failures.

Key Takeaways

  • Performance is a feature, not an afterthought
  • Observability enables confident optimization
  • Simple solutions often outperform complex ones
  • Customer feedback drives the most valuable improvements

Want to learn more about our architecture? Check out our technical documentation or reach out to our engineering team.