Export Service

v3.0

Functional Features

  1. Export File Format Support

    • Added support for both CSV (default streaming) and JSON output formats.
    • Implemented fileType query parameter to toggle between CSV and JSON.
    • Configured streaming architecture to handle large datasets efficiently.
    • Enforced export size limit of 80,000 rows (~600k data points) per file.
  2. Unique Task Generation

    • Each export request now creates a unique task ID.
    • Synthesis and PetDB API both derive task IDs directly from their URL search query parameters.
    • Ensures all tasks are traceable and uniquely tied to their search context.
  3. Task Lifecycle Management

    • Implemented complete task status tracking:
      • PendingProcessingSucceeded / Failed / Cancelled.
    • All transitions logged in DynamoDB for auditability.
    • Integrated error logging for Failed tasks with cause details.
  4. Queue System Implementation

    • Developed a FIFO (First-In-First-Out) queue for predictable processing order.
    • Supports up to 3 parallel tasks concurrently without performance degradation.
    • Added automatic retry mechanism (up to 3 attempts) for transient errors.
  5. User Notification System

    • Integrated email notifications (via AWS SES or equivalent) for export success or failure.
    • Emails include download link, query summary, and completion timestamp.
    • Supports real-time task progress updates via API endpoints (polling every 10 seconds).
    • Ensures reliability via DynamoDB-logged delivery status.
  6. Comprehensive Logging

    • Logs all task events—creation, status transitions, retries, cancellations, completions.
    • Stored in DynamoDB export table with full traceability.
    • Configured Point-in-Time Recovery (PITR) snapshots every 35 days.

Security & Data Protection

  1. Authentication & Authorization

    • Enforced API key / OAuth authentication on all export-related routes.
    • Authorized users validated against connected API credentials (Synthesis or PetDB).
  2. File Encryption & Controlled Access

    • All exported files encrypted before storage in AWS S3.
    • Small exports (<10k rows) automatically served as browser-downloadable links.
    • Public, pre-signed URLs generated and delivered via email notifications.
  3. Audit & Compliance Logging

    • Every export, download, and file access event recorded in DynamoDB.
    • Maintains complete export history for compliance and debugging.

Performance & Scalability

  1. Optimized Data Retrieval

    • Implemented single-pass OpenSearch scroll API queries for large dataset exports.
    • Prevents memory overload and ensures continuous streaming efficiency.
  2. High-Speed File Generation

    • Achieved generation of 40,000-row × 2897-column CSV within 4 minutes 20 seconds.
    • Node.js stream pipeline optimized for large data transformations.
  3. Concurrent Export Handling

    • Supports simultaneous exports through asynchronous workers.
    • Queue and worker model allows balanced load distribution across tasks.
  4. Cancellation Workflow

    • Implemented threaded cancellation checks within long-running exports.
    • Users can cancel via UI or API endpoint, ensuring graceful cleanup.

Reliability, Fault Tolerance, and Maintenance

  1. Automatic Retry Logic

    • Retry attempts (3x) for transient DynamoDB or OpenSearch failures.
    • Ensures stability in case of network or resource interruptions.
  2. Graceful Error Handling

    • Added structured error messaging for user-facing API responses.
    • Logged error stacks internally with contextual data for debugging.
  3. Horizontal Scaling Ready

    • Architecture supports adding multiple worker instances to scale horizontally.
    • Fully decoupled queue client allows easy distribution across containers.
  4. Monitoring and Observability

    • Configured structured logs viewable through ECS console.
    • Task lifecycle, performance metrics, and error patterns are traceable in real-time.

Storage, Cleanup, and Retention

  1. S3 Integration

    • All export outputs stored securely in AWS S3 buckets.
    • Public pre-signed URLs auto-generated and sent to users for direct access.
  2. Data Retention & Cleanup

    • Retention policy placeholders in place for automatic export cleanup (configurable).
    • Current design retains all exports indefinitely for accessibility.