Building a platform that analyzes 100K+ GitHub profiles while maintaining sub-second response times required careful architectural decisions. Here's how we did it.
The Challenge
We needed to build a system that:
- Fetches data from GitHub API efficiently
- Performs complex statistical calculations
- Handles 10K+ requests per day
- Maintains 24-hour caching
- Scales cost-effectively
Tech Stack Overview
Frontend
- Next.js 14: App router, server components, streaming
- React 18: Concurrent features, suspense
- Tailwind CSS: Utility-first styling
- Framer Motion: Smooth animations
- TypeScript: Type safety throughout
Backend
- Next.js API Routes: Serverless functions
- Prisma ORM: Type-safe database access
- PostgreSQL: Primary data store
- Redis: Caching layer (planned)
Infrastructure
- Vercel: Hosting and edge functions
- Vercel Postgres: Managed database
- GitHub API: Primary data source
Architecture Decisions
Server-Side First
Decision: Use server components by default
Reasoning:
- Reduced client bundle size
- Better SEO
- Faster initial page load
- Access to backend resources
Smart Caching Strategy
Decision: 24-hour cache with background revalidation
Why 24 hours?
- GitHub API rate limits (5000 req/hour)
- User data doesn't change significantly daily
- Optimal balance between freshness and cost
Statistical Engine
Decision: Server-side z-score normalization
Our z-score algorithm calculates how many standard deviations a developer's metrics are from the mean, then normalizes the result to a 0-100 scale. This provides fair comparisons across developers with different contribution styles.
Rate Limit Handling
Challenge: GitHub API limits (5000 req/hour)
Solution: Multi-layered approach
- Database caching (primary)
- Request queuing
- Rate limit monitoring
- Graceful degradation
Data Model
Decision: Denormalized for read performance
Our database schema stores pre-calculated metrics like total stars, forks, and commit counts directly in the profile table. This denormalized approach trades storage space for query speed.
Why Denormalized?
- Faster dashboard loads
- Simpler queries
- Fewer joins
- Better caching
Performance Optimizations
Parallel Data Fetching
Instead of fetching data sequentially, we use Promise.all to fetch user data, repositories, and statistics in parallel. This significantly reduces overall response time.
Streaming Responses
For large datasets, we use React Suspense to stream content to users as it becomes available, improving perceived performance.
Edge Functions
We deploy serverless functions to edge locations globally, ensuring fast response times for users worldwide.
Database Indexes
Strategic indexing on commonly queried fields like score, percentile, and username ensures fast lookups even as the database grows.
Scalability Considerations
Current Load
- 10K+ unique profiles analyzed
- 1K+ active users daily
- 50K+ API requests/month
- 99.9% uptime
Scaling Strategy
Vertical Scaling (Current):
- Vercel Pro plan
- Postgres connection pooling
- Efficient queries
Horizontal Scaling (Future):
- Redis for caching
- Read replicas
- CDN for static assets
- Worker queues for heavy jobs
Monitoring and Observability
Metrics We Track
- Response times (p50, p95, p99)
- Error rates
- API rate limit usage
- Database query performance
- User engagement
Tools
- Vercel Analytics
- Database query logs
- Custom logging
- Error tracking
Cost Optimization
Current Costs
- Hosting: approximately $20/month (Vercel Pro)
- Database: approximately $25/month (Postgres)
- API: $0 (GitHub API is free)
- Total: approximately $45/month
Optimization Strategies
- Aggressive caching
- Efficient queries
- Serverless architecture
- Static page generation
Lessons Learned
What Worked Well
- Server components reduced complexity
- Prisma made database work pleasant
- Caching strategy solved rate limits
- TypeScript caught bugs early
- Vercel simplified deployment
What We'd Change
- Add Redis earlier
- Implement queue system sooner
- More comprehensive error handling
- Better monitoring from day one
- API versioning strategy
Future Improvements
Short Term (Q1 2025)
- Redis caching layer
- Background job queue
- Advanced analytics
- API rate limit dashboard
- Performance monitoring
Long Term (2025)
- Real-time updates
- ML-based predictions
- Multi-language support
- Mobile app
- Enterprise features
Open Source
We believe in transparency. Check out:
- Our statistical algorithms
- Database schema
- API documentation
- Performance benchmarks
Conclusion
Building GitCheck taught us that:
- Simple architectures scale better
- Caching solves most problems
- TypeScript is worth it
- Measure everything
- Users care about speed
The tech stack matters, but architecture decisions matter more.