PetDB API
Core Overview
The PetDB API v4 provides a unified interface for querying, aggregating, and exporting geochemical data.
Built in Node.js + Express, the API connects to AWS DynamoDB for task management, OpenSearch for dataset querying, and AWS S3 for export storage.
All routes are designed for high performance, modular scalability, and error resilience to be used by EarthChem Synthesis.
The PetDB API v4 implementation now includes 35 fully realized production features, delivering:
- Real-time OpenSearch aggregations
- Secure export generation via S3 and DynamoDB
- Geospatial filtering and location services
- Advanced citation and sample linking
- Logging and audit trail across all queries
Functional Features
-
Unified Vocabulary Endpoints
- Provides hierarchical and flat data aggregations for geochemical vocabularies.
- Implemented
GET /v4/*
endpoints for geoFeatures, taxons, variables, analysisTypes, authors, and more. - Supports nested composite aggregations sourced from OpenSearch composite queries.
- Delivers structured, normalized responses to the client in real time.
-
GeoFeature Hierarchies
GET /v4/geoFeatures
returns type → name relationships for geological features.- Utilizes composite aggregations from OpenSearch fields.
-
Taxonomy Aggregations
GET /v4/taxons
exposes hierarchical sample taxonomy data (parent/child).- Dynamically builds aggregations from
sampleTaxons
. - Fully normalized across all available datasets.
-
Expedition and Author Retrieval
GET /v4/expeditions
returns expedition names grouped by dataset origin.GET /v4/authors
lists citation authors (familyName only) for publication filtering.- Designed for low-latency, pre-aggregated access to metadata.
-
Citation Information
- Endpoints for citationTitles, publicationYears, and journals allow fast metadata retrieval.
- Optimized for keyword search and aggregation filtering.
- Supports pagination and client-side auto-suggestions.
-
Variable and Analysis Type Hierarchies
GET /v4/variables
andGET /v4/analysisTypes
support multi-level relationships.- Example: AnalysisType → MineralName → MaterialName → InclusionType.
- Aggregations sourced from nested OpenSearch documents.
-
Laboratory and Data Source Endpoints
- Provides non-hierarchical aggregation for laboratories and dataSources.
- Useful for filtering data provenance and processing origins.
-
Sample Name Retrieval
GET /v4/sampleNames
returns flattened sample name lists from OpenSearch.- Used for cross-index matching and display filtering on UI.
-
Multi-Vocabulary Suggestions
GET /v4/
endpoint aggregates top 10 vocabulary suggestions across all vocabularies simultaneously.- Useful for search auto-complete and smart query prediction.
Citation System Features
-
Comprehensive Citation Retrieval
GET /v4/citations/:id
retrieves complete citation records (authors, journals, year).- Data sourced from OpenSearch and structured in normalized JSON.
- Implements caching layer for high-frequency queries.
-
Citation-Sample Linking
GET /v4/citations/:id/samples
connects citations to related sample records.- Enables dataset traceability between literature and sample evidence.
-
Citation-Method Association
GET /v4/citations/:id/methods
provides all analytical methods used in the citation context.- Supports data provenance tracking and analytical reproducibility.
-
Searchable Citations Endpoint
GET /v4/citations?{search}
supports filtered search queries using OpenSearch.- Integrates fuzzy matching and partial search on multiple fields.
- Handles pagination and size constraints for performance.
Sample Data System
-
Sample Metadata Retrieval
GET /v4/samples/:id
returns all sample information, fully normalized.GET /v4/samples/:id/metadata
exposes associated metadata fields for analysis and export.- OpenSearch scroll queries used for large sample retrievals.
Export System
-
Export Submission Endpoint
GET /v4/exports/submit?{search}
submits export requests to the system.- Adds entries into DynamoDB with metadata: timestamp, user email, and query context.
- Supports multiple export file types (CSV/JSON).
-
Export Task Status
GET /v4/exports/:taskId
retrieves live status of an export (Pending, Processing, Failed, Succeeded).- Dynamically updated by background processors through DynamoDB streams.
-
Export Cancellation
GET /v4/exports/cancel/:taskId
cancels a pending export task.- Triggers update in DynamoDB to set
status=CANCELLED
. - Sends optional cancellation email notification to requester.
Metrics & Monitoring
-
System Metrics Endpoint
GET /v4/metrics
returns dataset-wide statistics (sample counts, citations, data points).- Optimized for API dashboards and usage visualization.
-
Export Table Metrics
GET /v4/metrics/exports/:taskStatus
lists all exports filtered by task status.- Includes email, IP, export purpose, file location, timestamps, and status history.
Location Services
-
Location Query Endpoints
GET /v4/locations?{search}
returns clustered sample coordinates.- Optimized for use with mapping clients (Mapbox/MapLibre).
- Results returned as aggregated cluster GeoJSON.
-
Sample Location Metadata
GET /v4/locations/samples?{search}
retrieves sample metadata including lat/lon and sample IDs.- Supports UI rendering of marker popups and tooltips.
-
Tile-Based Location Clustering
GET /v4/locations/tile?{search}
returns tile-based cluster grids.- Enables efficient display of large-scale datasets in map tiles.
-
Proximity Search
GET /v4/locations/point?{search}
retrieves all samples within a radial distance from given coordinates.- Uses OpenSearch geo-distance queries with precision control.
Search and Filtering Features
-
Dynamic Search Props
- Supports query parameters:
sampleCollections
,expeditions
,authors
,journals
,publicationYears
,sampleNames
, etc. - Enables combined filters via OR (
||
) or range (2000-2024
) syntax. - Automatically parsed and formatted in Express middleware.
- Supports query parameters:
-
Advanced Filters
- Supports structured filters for complex objects:
analysisTypes=minerals::[aganthite]
taxons=meteorite::[nahklite]
geoFeatures=crater::[plum crater]
- Parses nested arrays and dot notation (e.g.,
Si.WET → { Si: [“WET”] }
).
- Supports structured filters for complex objects:
-
Location Filters
- Accepts spatial parameters like
boundingBox
,precision
,polygons
, andsize
. - Fully compatible with GeoJSON polygons and multi-bounding-box queries.
- Enables dynamic geospatial filtering on client UI.
- Accepts spatial parameters like
Reliability and System Architecture
-
Error Handling & Logging
- All errors logged with context to ECS logs.
- Includes structured JSON error responses for API consumers.
- Standardized error codes across all v4 endpoints.
-
Performance Optimizations
- Built-in caching and pagination for composite aggregations.
- Scroll API usage for massive data queries (100k+ hits).
- Streamed responses for export to S3 without overloading memory.
-
DynamoDB Integration
- Centralized tracking of exports, tokens, and task statuses.
- Uses Point-In-Time Recovery (PITR) and consistent backups.
- Queryable by user email for audit and recovery.
-
OpenSearch Integration
- Core data retrieval via OpenSearch composite aggregations.
- Index mappings support nested structures for hierarchical vocabularies.
- High-availability configuration with scroll + afterKey for pagination.