Export Service
Functional Features
-
Export File Format Support
- Added support for both CSV (default streaming) and JSON output formats.
- Implemented
fileType
query parameter to toggle between CSV and JSON. - Configured streaming architecture to handle large datasets efficiently.
- Enforced export size limit of 80,000 rows (~600k data points) per file.
-
Unique Task Generation
- Each export request now creates a unique task ID.
- Synthesis and PetDB API both derive task IDs directly from their URL search query parameters.
- Ensures all tasks are traceable and uniquely tied to their search context.
-
Task Lifecycle Management
- Implemented complete task status tracking:
Pending
→Processing
→Succeeded
/Failed
/Cancelled
.
- All transitions logged in DynamoDB for auditability.
- Integrated error logging for
Failed
tasks with cause details.
- Implemented complete task status tracking:
-
Queue System Implementation
- Developed a FIFO (First-In-First-Out) queue for predictable processing order.
- Supports up to 3 parallel tasks concurrently without performance degradation.
- Added automatic retry mechanism (up to 3 attempts) for transient errors.
-
User Notification System
- Integrated email notifications (via AWS SES or equivalent) for export success or failure.
- Emails include download link, query summary, and completion timestamp.
- Supports real-time task progress updates via API endpoints (polling every 10 seconds).
- Ensures reliability via DynamoDB-logged delivery status.
-
Comprehensive Logging
- Logs all task events—creation, status transitions, retries, cancellations, completions.
- Stored in DynamoDB export table with full traceability.
- Configured Point-in-Time Recovery (PITR) snapshots every 35 days.
Security & Data Protection
-
Authentication & Authorization
- Enforced API key / OAuth authentication on all export-related routes.
- Authorized users validated against connected API credentials (Synthesis or PetDB).
-
File Encryption & Controlled Access
- All exported files encrypted before storage in AWS S3.
- Small exports (<10k rows) automatically served as browser-downloadable links.
- Public, pre-signed URLs generated and delivered via email notifications.
-
Audit & Compliance Logging
- Every export, download, and file access event recorded in DynamoDB.
- Maintains complete export history for compliance and debugging.
Performance & Scalability
-
Optimized Data Retrieval
- Implemented single-pass OpenSearch scroll API queries for large dataset exports.
- Prevents memory overload and ensures continuous streaming efficiency.
-
High-Speed File Generation
- Achieved generation of 40,000-row × 2897-column CSV within 4 minutes 20 seconds.
- Node.js stream pipeline optimized for large data transformations.
-
Concurrent Export Handling
- Supports simultaneous exports through asynchronous workers.
- Queue and worker model allows balanced load distribution across tasks.
-
Cancellation Workflow
- Implemented threaded cancellation checks within long-running exports.
- Users can cancel via UI or API endpoint, ensuring graceful cleanup.
Reliability, Fault Tolerance, and Maintenance
-
Automatic Retry Logic
- Retry attempts (3x) for transient DynamoDB or OpenSearch failures.
- Ensures stability in case of network or resource interruptions.
-
Graceful Error Handling
- Added structured error messaging for user-facing API responses.
- Logged error stacks internally with contextual data for debugging.
-
Horizontal Scaling Ready
- Architecture supports adding multiple worker instances to scale horizontally.
- Fully decoupled queue client allows easy distribution across containers.
-
Monitoring and Observability
- Configured structured logs viewable through ECS console.
- Task lifecycle, performance metrics, and error patterns are traceable in real-time.
Storage, Cleanup, and Retention
-
S3 Integration
- All export outputs stored securely in AWS S3 buckets.
- Public pre-signed URLs auto-generated and sent to users for direct access.
-
Data Retention & Cleanup
- Retention policy placeholders in place for automatic export cleanup (configurable).
- Current design retains all exports indefinitely for accessibility.