FHIR Ingestion System Documentation
Overview
The FHIR ingestion system is one of the key technical achievements of the LyfeAI Provider Platform. It enables automatic extraction and structuring of medical data from uploaded documents using a combination of AWS services and OpenAI.
Architecture
graph TD
A[User Upload] --> B[Document Upload Handler]
B --> C{File Type?}
C -->|PDF| D[PDF Processing]
C -->|Image| E[Image Processing]
D --> F{AWS Available?}
E --> F
F -->|Yes| G[AWS Textract]
F -->|No| H[OpenAI Vision]
G --> I[AWS Comprehend Medical]
H --> J[OpenAI GPT-4]
I --> K[FHIR Converter]
J --> K
K --> L[Data Validation]
L --> M[Supabase Storage]
M --> N[UI Update]
Core Components
1. Document Upload Handler
File: app/actions/ai-actions.ts
export async function processDocument(formData: FormData) {
const file = formData.get('file') as File;
if (!file) {
throw new Error('No file provided');
}
// Validate file type
const validTypes = ['application/pdf', 'image/jpeg', 'image/png'];
if (!validTypes.includes(file.type)) {
throw new Error('Invalid file type');
}
// Process based on availability of AWS services
let extractedData;
if (process.env.AWS_ACCESS_KEY_ID) {
extractedData = await processWithAWS(file);
} else {
extractedData = await processWithOpenAI(file);
}
// Convert to FHIR format
const fhirData = await convertToFHIR(extractedData);
// Store in database
await storeFHIRData(fhirData);
return { success: true, data: fhirData };
}
2. AWS Processing Pipeline
File: lib/aws-document-processor.ts (conceptual - not fully implemented)
import { TextractClient, AnalyzeDocumentCommand } from "@aws-sdk/client-textract";
import { ComprehendMedicalClient, DetectEntitiesV2Command } from "@aws-sdk/client-comprehendmedical";
async function processWithAWS(file: File): Promise<ExtractedData> {
// Upload to S3
const s3Key = await uploadToS3(file);
// Extract text with Textract
const textractClient = new TextractClient({ region: process.env.AWS_REGION });
const textractResult = await textractClient.send(
new AnalyzeDocumentCommand({
Document: { S3Object: { Bucket: process.env.AWS_S3_BUCKET, Name: s3Key } },
FeatureTypes: ["TABLES", "FORMS"]
})
);
// Extract medical entities with Comprehend Medical
const comprehendClient = new ComprehendMedicalClient({ region: process.env.AWS_REGION });
const extractedText = consolidateTextractResults(textractResult);
const medicalEntities = await comprehendClient.send(
new DetectEntitiesV2Command({ Text: extractedText })
);
return {
rawText: extractedText,
entities: medicalEntities.Entities,
tables: extractTables(textractResult),
forms: extractForms(textractResult)
};
}
3. OpenAI Fallback Processing
File: lib/ai-service.ts
export async function extractPatientData(content: string): Promise<any> {
try {
const completion = await openai.chat.completions.create({
messages: [
{
role: "system",
content: `You are a medical data extraction specialist. Extract patient information and return it in a structured JSON format.
Extract the following information if available:
- Patient demographics (name, DOB, gender, contact info)
- Medical record number (MRN)
- Allergies
- Current medications
- Medical conditions/diagnoses
- Recent procedures
- Vital signs
- Lab results
Return the data in this exact JSON structure:
{
"patient": {
"name": "string",
"dateOfBirth": "YYYY-MM-DD",
"gender": "string",
"mrn": "string",
"phone": "string",
"email": "string",
"address": "string"
},
"allergies": ["string"],
"medications": [{
"name": "string",
"dosage": "string",
"frequency": "string",
"startDate": "YYYY-MM-DD"
}],
"conditions": [{
"name": "string",
"icdCode": "string",
"dateOfDiagnosis": "YYYY-MM-DD",
"status": "active|resolved|chronic"
}],
"vitals": {
"bloodPressure": "string",
"heartRate": "number",
"temperature": "number",
"weight": "string",
"height": "string"
},
"labResults": [{
"testName": "string",
"value": "string",
"unit": "string",
"referenceRange": "string",
"date": "YYYY-MM-DD",
"status": "normal|abnormal|critical"
}]
}`
},
{
role: "user",
content: content
}
],
model: "gpt-4-turbo-preview",
response_format: { type: "json_object" },
temperature: 0.1,
max_tokens: 4000
});
const result = completion.choices[0].message.content;
return JSON.parse(result || '{}');
} catch (error) {
console.error('OpenAI extraction error:', error);
return simulateExtraction(content);
}
}
4. FHIR Converter
File: lib/fhir-converter.ts (conceptual)
import { Patient, AllergyIntolerance, MedicationStatement, Condition } from '@medplum/fhirtypes';
export async function convertToFHIR(extractedData: any): Promise<FHIRBundle> {
const bundle = {
resourceType: 'Bundle',
type: 'collection',
entry: []
};
// Convert patient demographics
const patient: Patient = {
resourceType: 'Patient',
id: generateId(),
identifier: [{
system: 'http://hospital.local/mrn',
value: extractedData.patient.mrn
}],
name: [{
use: 'official',
text: extractedData.patient.name,
family: extractLastName(extractedData.patient.name),
given: [extractFirstName(extractedData.patient.name)]
}],
gender: mapGender(extractedData.patient.gender),
birthDate: extractedData.patient.dateOfBirth,
telecom: [
{
system: 'phone',
value: extractedData.patient.phone,
use: 'mobile'
},
{
system: 'email',
value: extractedData.patient.email
}
],
address: [{
use: 'home',
text: extractedData.patient.address,
...parseAddress(extractedData.patient.address)
}]
};
bundle.entry.push({ resource: patient });
// Convert allergies
for (const allergy of extractedData.allergies || []) {
const allergyResource: AllergyIntolerance = {
resourceType: 'AllergyIntolerance',
id: generateId(),
patient: { reference: `Patient/${patient.id}` },
code: {
text: allergy
},
clinicalStatus: {
coding: [{
system: 'http://terminology.hl7.org/CodeSystem/allergyintolerance-clinical',
code: 'active'
}]
}
};
bundle.entry.push({ resource: allergyResource });
}
// Convert medications
for (const med of extractedData.medications || []) {
const medicationResource: MedicationStatement = {
resourceType: 'MedicationStatement',
id: generateId(),
status: 'active',
subject: { reference: `Patient/${patient.id}` },
medicationCodeableConcept: {
text: med.name
},
dosage: [{
text: `${med.dosage} ${med.frequency}`,
timing: {
repeat: {
frequency: parseFrequency(med.frequency)
}
}
}]
};
bundle.entry.push({ resource: medicationResource });
}
// Convert conditions
for (const condition of extractedData.conditions || []) {
const conditionResource: Condition = {
resourceType: 'Condition',
id: generateId(),
subject: { reference: `Patient/${patient.id}` },
code: {
text: condition.name,
coding: condition.icdCode ? [{
system: 'http://hl7.org/fhir/sid/icd-10',
code: condition.icdCode
}] : undefined
},
clinicalStatus: {
coding: [{
system: 'http://terminology.hl7.org/CodeSystem/condition-clinical',
code: mapConditionStatus(condition.status)
}]
},
onsetDateTime: condition.dateOfDiagnosis
};
bundle.entry.push({ resource: conditionResource });
}
return bundle;
}
5. Enhanced FHIR Service
File: lib/enhanced-fhir-service.ts
export class EnhancedFHIRService {
static async parseFHIRData(data: any): Promise<ParsedFHIRData> {
const result: ParsedFHIRData = {
patient: null,
conditions: [],
medications: [],
allergies: [],
procedures: [],
observations: [],
encounters: []
};
if (data.resourceType === 'Bundle') {
// Process bundle entries
for (const entry of data.entry || []) {
await this.processResource(entry.resource, result);
}
} else if (data.resourceType === 'Patient') {
// Direct patient resource
result.patient = await this.extractPatientData(data);
}
return result;
}
private static async processResource(resource: any, result: ParsedFHIRData) {
switch (resource.resourceType) {
case 'Patient':
result.patient = await this.extractPatientData(resource);
break;
case 'Condition':
result.conditions.push(this.extractCondition(resource));
break;
case 'MedicationStatement':
case 'MedicationRequest':
result.medications.push(this.extractMedication(resource));
break;
case 'AllergyIntolerance':
result.allergies.push(this.extractAllergy(resource));
break;
case 'Procedure':
result.procedures.push(this.extractProcedure(resource));
break;
case 'Observation':
result.observations.push(this.extractObservation(resource));
break;
case 'Encounter':
result.encounters.push(this.extractEncounter(resource));
break;
}
}
}
Data Flow
1. Upload Process
- User selects file through UI
- File validated for type and size
- Progress indicator shown
- File sent to server action
2. Extraction Process
- Determine processing method (AWS vs OpenAI)
- Extract text and structure from document
- Identify medical entities
- Parse tables and forms
- Handle errors gracefully
3. Conversion Process
- Map extracted data to FHIR resources
- Validate FHIR structure
- Generate unique identifiers
- Create resource references
- Bundle related resources
4. Storage Process
- Store FHIR bundle in database
- Update patient record
- Create audit trail
- Trigger UI updates
- Send notifications
Configuration
AWS Configuration
# Required for AWS processing
AWS_ACCESS_KEY_ID=your-key
AWS_SECRET_ACCESS_KEY=your-secret
AWS_REGION=us-east-1
AWS_S3_BUCKET=lyfeai-documents
# IAM Policy needed
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"textract:AnalyzeDocument",
"comprehendmedical:DetectEntitiesV2",
"s3:PutObject",
"s3:GetObject"
],
"Resource": "*"
}
]
}
OpenAI Configuration
# Required for OpenAI processing
OPENAI_API_KEY=sk-your-key
# Recommended model settings
MODEL=gpt-4-turbo-preview
TEMPERATURE=0.1
MAX_TOKENS=4000
Error Handling
1. Upload Errors
// File too large
if (file.size > 10 * 1024 * 1024) {
throw new Error('File size exceeds 10MB limit');
}
// Invalid file type
if (!ALLOWED_TYPES.includes(file.type)) {
throw new Error('File type not supported');
}
// Network error
try {
await uploadFile(file);
} catch (error) {
throw new Error('Upload failed. Please try again.');
}
2. Processing Errors
// AWS service errors
try {
const result = await textractClient.send(command);
} catch (error) {
console.error('Textract error:', error);
// Fallback to OpenAI
return processWithOpenAI(file);
}
// OpenAI errors
try {
const result = await openai.chat.completions.create(params);
} catch (error) {
if (error.code === 'rate_limit_exceeded') {
// Implement exponential backoff
await delay(1000 * Math.pow(2, retryCount));
return retry();
}
// Use mock data as last resort
return simulateExtraction(content);
}
3. FHIR Validation Errors
// Invalid FHIR structure
try {
validateFHIRResource(resource);
} catch (error) {
console.error('FHIR validation error:', error);
// Attempt to fix common issues
resource = fixCommonFHIRIssues(resource);
}
// Missing required fields
if (!patient.name || !patient.birthDate) {
throw new Error('Patient must have name and birth date');
}
Performance Optimization
1. Caching Strategy
// Cache extracted data
const cacheKey = `extraction:${fileHash}`;
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Process and cache result
const result = await processDocument(file);
await redis.set(cacheKey, JSON.stringify(result), 'EX', 3600);
2. Parallel Processing
// Process multiple pages in parallel
const pages = await splitPDF(file);
const results = await Promise.all(
pages.map(page => processPage(page))
);
return mergeResults(results);
3. Streaming Large Files
// Stream processing for large documents
const stream = file.stream();
const chunks = [];
for await (const chunk of stream) {
const processed = await processChunk(chunk);
chunks.push(processed);
// Update progress
onProgress(chunks.length / totalChunks);
}
Testing
Unit Tests
describe('FHIR Converter', () => {
it('should convert patient data to FHIR', () => {
const input = {
patient: {
name: 'John Doe',
dateOfBirth: '1980-01-01',
gender: 'male'
}
};
const result = convertToFHIR(input);
expect(result.entry[0].resource.resourceType).toBe('Patient');
expect(result.entry[0].resource.name[0].text).toBe('John Doe');
});
});
Integration Tests
describe('Document Processing', () => {
it('should process PDF and extract data', async () => {
const file = new File(['mock pdf content'], 'test.pdf', {
type: 'application/pdf'
});
const result = await processDocument(file);
expect(result.success).toBe(true);
expect(result.data).toHaveProperty('patient');
});
});
Known Issues
- Large File Handling: Files over 10MB may timeout
- Handwriting Recognition: Poor accuracy on handwritten notes
- Table Extraction: Complex tables may lose structure
- Image Quality: Low resolution images fail to process
- Language Support: English only currently
Future Enhancements
- Batch Processing: Upload multiple files at once
- Progress Streaming: Real-time extraction progress
- Custom Templates: Define extraction templates per document type
- ML Training: Train custom models on hospital-specific formats
- Webhook Support: Notify external systems on completion
Security Considerations
- PHI Protection: All documents contain PHI
- Encryption: Encrypt files at rest and in transit
- Access Control: Limit who can upload documents
- Audit Trail: Log all document access
- Retention Policy: Auto-delete after processing
Conclusion
The FHIR ingestion system represents a significant technical achievement in automating medical data extraction. While the current implementation provides a solid foundation, production deployment requires addressing the security, performance, and reliability considerations outlined above. The system's modular design allows for easy enhancement and adaptation to specific healthcare provider needs.