FHIR Ingestion System Documentation

Overview

The FHIR ingestion system is one of the key technical achievements of the LyfeAI Provider Platform. It enables automatic extraction and structuring of medical data from uploaded documents using a combination of AWS services and OpenAI.

Architecture

graph TD
    A[User Upload] --> B[Document Upload Handler]
    B --> C{File Type?}
    C -->|PDF| D[PDF Processing]
    C -->|Image| E[Image Processing]
    D --> F{AWS Available?}
    E --> F
    F -->|Yes| G[AWS Textract]
    F -->|No| H[OpenAI Vision]
    G --> I[AWS Comprehend Medical]
    H --> J[OpenAI GPT-4]
    I --> K[FHIR Converter]
    J --> K
    K --> L[Data Validation]
    L --> M[Supabase Storage]
    M --> N[UI Update]

Core Components

1. Document Upload Handler

File: app/actions/ai-actions.ts

export async function processDocument(formData: FormData) {
  const file = formData.get('file') as File;
  
  if (!file) {
    throw new Error('No file provided');
  }

  // Validate file type
  const validTypes = ['application/pdf', 'image/jpeg', 'image/png'];
  if (!validTypes.includes(file.type)) {
    throw new Error('Invalid file type');
  }

  // Process based on availability of AWS services
  let extractedData;
  if (process.env.AWS_ACCESS_KEY_ID) {
    extractedData = await processWithAWS(file);
  } else {
    extractedData = await processWithOpenAI(file);
  }

  // Convert to FHIR format
  const fhirData = await convertToFHIR(extractedData);
  
  // Store in database
  await storeFHIRData(fhirData);
  
  return { success: true, data: fhirData };
}

2. AWS Processing Pipeline

File: lib/aws-document-processor.ts (conceptual - not fully implemented)

import { TextractClient, AnalyzeDocumentCommand } from "@aws-sdk/client-textract";
import { ComprehendMedicalClient, DetectEntitiesV2Command } from "@aws-sdk/client-comprehendmedical";

async function processWithAWS(file: File): Promise<ExtractedData> {
  // Upload to S3
  const s3Key = await uploadToS3(file);
  
  // Extract text with Textract
  const textractClient = new TextractClient({ region: process.env.AWS_REGION });
  const textractResult = await textractClient.send(
    new AnalyzeDocumentCommand({
      Document: { S3Object: { Bucket: process.env.AWS_S3_BUCKET, Name: s3Key } },
      FeatureTypes: ["TABLES", "FORMS"]
    })
  );
  
  // Extract medical entities with Comprehend Medical
  const comprehendClient = new ComprehendMedicalClient({ region: process.env.AWS_REGION });
  const extractedText = consolidateTextractResults(textractResult);
  
  const medicalEntities = await comprehendClient.send(
    new DetectEntitiesV2Command({ Text: extractedText })
  );
  
  return {
    rawText: extractedText,
    entities: medicalEntities.Entities,
    tables: extractTables(textractResult),
    forms: extractForms(textractResult)
  };
}

3. OpenAI Fallback Processing

File: lib/ai-service.ts

export async function extractPatientData(content: string): Promise<any> {
  try {
    const completion = await openai.chat.completions.create({
      messages: [
        {
          role: "system",
          content: `You are a medical data extraction specialist. Extract patient information and return it in a structured JSON format.
          
          Extract the following information if available:
          - Patient demographics (name, DOB, gender, contact info)
          - Medical record number (MRN)
          - Allergies
          - Current medications
          - Medical conditions/diagnoses
          - Recent procedures
          - Vital signs
          - Lab results
          
          Return the data in this exact JSON structure:
          {
            "patient": {
              "name": "string",
              "dateOfBirth": "YYYY-MM-DD",
              "gender": "string",
              "mrn": "string",
              "phone": "string",
              "email": "string",
              "address": "string"
            },
            "allergies": ["string"],
            "medications": [{
              "name": "string",
              "dosage": "string",
              "frequency": "string",
              "startDate": "YYYY-MM-DD"
            }],
            "conditions": [{
              "name": "string",
              "icdCode": "string",
              "dateOfDiagnosis": "YYYY-MM-DD",
              "status": "active|resolved|chronic"
            }],
            "vitals": {
              "bloodPressure": "string",
              "heartRate": "number",
              "temperature": "number",
              "weight": "string",
              "height": "string"
            },
            "labResults": [{
              "testName": "string",
              "value": "string",
              "unit": "string",
              "referenceRange": "string",
              "date": "YYYY-MM-DD",
              "status": "normal|abnormal|critical"
            }]
          }`
        },
        {
          role: "user",
          content: content
        }
      ],
      model: "gpt-4-turbo-preview",
      response_format: { type: "json_object" },
      temperature: 0.1,
      max_tokens: 4000
    });

    const result = completion.choices[0].message.content;
    return JSON.parse(result || '{}');
  } catch (error) {
    console.error('OpenAI extraction error:', error);
    return simulateExtraction(content);
  }
}

4. FHIR Converter

File: lib/fhir-converter.ts (conceptual)

import { Patient, AllergyIntolerance, MedicationStatement, Condition } from '@medplum/fhirtypes';

export async function convertToFHIR(extractedData: any): Promise<FHIRBundle> {
  const bundle = {
    resourceType: 'Bundle',
    type: 'collection',
    entry: []
  };

  // Convert patient demographics
  const patient: Patient = {
    resourceType: 'Patient',
    id: generateId(),
    identifier: [{
      system: 'http://hospital.local/mrn',
      value: extractedData.patient.mrn
    }],
    name: [{
      use: 'official',
      text: extractedData.patient.name,
      family: extractLastName(extractedData.patient.name),
      given: [extractFirstName(extractedData.patient.name)]
    }],
    gender: mapGender(extractedData.patient.gender),
    birthDate: extractedData.patient.dateOfBirth,
    telecom: [
      {
        system: 'phone',
        value: extractedData.patient.phone,
        use: 'mobile'
      },
      {
        system: 'email',
        value: extractedData.patient.email
      }
    ],
    address: [{
      use: 'home',
      text: extractedData.patient.address,
      ...parseAddress(extractedData.patient.address)
    }]
  };
  
  bundle.entry.push({ resource: patient });

  // Convert allergies
  for (const allergy of extractedData.allergies || []) {
    const allergyResource: AllergyIntolerance = {
      resourceType: 'AllergyIntolerance',
      id: generateId(),
      patient: { reference: `Patient/${patient.id}` },
      code: {
        text: allergy
      },
      clinicalStatus: {
        coding: [{
          system: 'http://terminology.hl7.org/CodeSystem/allergyintolerance-clinical',
          code: 'active'
        }]
      }
    };
    bundle.entry.push({ resource: allergyResource });
  }

  // Convert medications
  for (const med of extractedData.medications || []) {
    const medicationResource: MedicationStatement = {
      resourceType: 'MedicationStatement',
      id: generateId(),
      status: 'active',
      subject: { reference: `Patient/${patient.id}` },
      medicationCodeableConcept: {
        text: med.name
      },
      dosage: [{
        text: `${med.dosage} ${med.frequency}`,
        timing: {
          repeat: {
            frequency: parseFrequency(med.frequency)
          }
        }
      }]
    };
    bundle.entry.push({ resource: medicationResource });
  }

  // Convert conditions
  for (const condition of extractedData.conditions || []) {
    const conditionResource: Condition = {
      resourceType: 'Condition',
      id: generateId(),
      subject: { reference: `Patient/${patient.id}` },
      code: {
        text: condition.name,
        coding: condition.icdCode ? [{
          system: 'http://hl7.org/fhir/sid/icd-10',
          code: condition.icdCode
        }] : undefined
      },
      clinicalStatus: {
        coding: [{
          system: 'http://terminology.hl7.org/CodeSystem/condition-clinical',
          code: mapConditionStatus(condition.status)
        }]
      },
      onsetDateTime: condition.dateOfDiagnosis
    };
    bundle.entry.push({ resource: conditionResource });
  }

  return bundle;
}

5. Enhanced FHIR Service

File: lib/enhanced-fhir-service.ts

export class EnhancedFHIRService {
  static async parseFHIRData(data: any): Promise<ParsedFHIRData> {
    const result: ParsedFHIRData = {
      patient: null,
      conditions: [],
      medications: [],
      allergies: [],
      procedures: [],
      observations: [],
      encounters: []
    };

    if (data.resourceType === 'Bundle') {
      // Process bundle entries
      for (const entry of data.entry || []) {
        await this.processResource(entry.resource, result);
      }
    } else if (data.resourceType === 'Patient') {
      // Direct patient resource
      result.patient = await this.extractPatientData(data);
    }

    return result;
  }

  private static async processResource(resource: any, result: ParsedFHIRData) {
    switch (resource.resourceType) {
      case 'Patient':
        result.patient = await this.extractPatientData(resource);
        break;
      case 'Condition':
        result.conditions.push(this.extractCondition(resource));
        break;
      case 'MedicationStatement':
      case 'MedicationRequest':
        result.medications.push(this.extractMedication(resource));
        break;
      case 'AllergyIntolerance':
        result.allergies.push(this.extractAllergy(resource));
        break;
      case 'Procedure':
        result.procedures.push(this.extractProcedure(resource));
        break;
      case 'Observation':
        result.observations.push(this.extractObservation(resource));
        break;
      case 'Encounter':
        result.encounters.push(this.extractEncounter(resource));
        break;
    }
  }
}

Data Flow

1. Upload Process

User selects file through UI
File validated for type and size
Progress indicator shown
File sent to server action

2. Extraction Process

Determine processing method (AWS vs OpenAI)
Extract text and structure from document
Identify medical entities
Parse tables and forms
Handle errors gracefully

3. Conversion Process

Map extracted data to FHIR resources
Validate FHIR structure
Generate unique identifiers
Create resource references
Bundle related resources

4. Storage Process

Store FHIR bundle in database
Update patient record
Create audit trail
Trigger UI updates
Send notifications

Configuration

AWS Configuration

# Required for AWS processing
AWS_ACCESS_KEY_ID=your-key
AWS_SECRET_ACCESS_KEY=your-secret
AWS_REGION=us-east-1
AWS_S3_BUCKET=lyfeai-documents

# IAM Policy needed
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "textract:AnalyzeDocument",
        "comprehendmedical:DetectEntitiesV2",
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "*"
    }
  ]
}

OpenAI Configuration

# Required for OpenAI processing
OPENAI_API_KEY=sk-your-key

# Recommended model settings
MODEL=gpt-4-turbo-preview
TEMPERATURE=0.1
MAX_TOKENS=4000

Error Handling

1. Upload Errors

// File too large
if (file.size > 10 * 1024 * 1024) {
  throw new Error('File size exceeds 10MB limit');
}

// Invalid file type
if (!ALLOWED_TYPES.includes(file.type)) {
  throw new Error('File type not supported');
}

// Network error
try {
  await uploadFile(file);
} catch (error) {
  throw new Error('Upload failed. Please try again.');
}

2. Processing Errors

// AWS service errors
try {
  const result = await textractClient.send(command);
} catch (error) {
  console.error('Textract error:', error);
  // Fallback to OpenAI
  return processWithOpenAI(file);
}

// OpenAI errors
try {
  const result = await openai.chat.completions.create(params);
} catch (error) {
  if (error.code === 'rate_limit_exceeded') {
    // Implement exponential backoff
    await delay(1000 * Math.pow(2, retryCount));
    return retry();
  }
  // Use mock data as last resort
  return simulateExtraction(content);
}

3. FHIR Validation Errors

// Invalid FHIR structure
try {
  validateFHIRResource(resource);
} catch (error) {
  console.error('FHIR validation error:', error);
  // Attempt to fix common issues
  resource = fixCommonFHIRIssues(resource);
}

// Missing required fields
if (!patient.name || !patient.birthDate) {
  throw new Error('Patient must have name and birth date');
}

Performance Optimization

1. Caching Strategy

// Cache extracted data
const cacheKey = `extraction:${fileHash}`;
const cached = await redis.get(cacheKey);
if (cached) {
  return JSON.parse(cached);
}

// Process and cache result
const result = await processDocument(file);
await redis.set(cacheKey, JSON.stringify(result), 'EX', 3600);

2. Parallel Processing

// Process multiple pages in parallel
const pages = await splitPDF(file);
const results = await Promise.all(
  pages.map(page => processPage(page))
);
return mergeResults(results);

3. Streaming Large Files

// Stream processing for large documents
const stream = file.stream();
const chunks = [];

for await (const chunk of stream) {
  const processed = await processChunk(chunk);
  chunks.push(processed);
  
  // Update progress
  onProgress(chunks.length / totalChunks);
}

Testing

Unit Tests

describe('FHIR Converter', () => {
  it('should convert patient data to FHIR', () => {
    const input = {
      patient: {
        name: 'John Doe',
        dateOfBirth: '1980-01-01',
        gender: 'male'
      }
    };
    
    const result = convertToFHIR(input);
    
    expect(result.entry[0].resource.resourceType).toBe('Patient');
    expect(result.entry[0].resource.name[0].text).toBe('John Doe');
  });
});

Integration Tests

describe('Document Processing', () => {
  it('should process PDF and extract data', async () => {
    const file = new File(['mock pdf content'], 'test.pdf', {
      type: 'application/pdf'
    });
    
    const result = await processDocument(file);
    
    expect(result.success).toBe(true);
    expect(result.data).toHaveProperty('patient');
  });
});

Known Issues

Large File Handling: Files over 10MB may timeout
Handwriting Recognition: Poor accuracy on handwritten notes
Table Extraction: Complex tables may lose structure
Image Quality: Low resolution images fail to process
Language Support: English only currently

Future Enhancements

Batch Processing: Upload multiple files at once
Progress Streaming: Real-time extraction progress
Custom Templates: Define extraction templates per document type
ML Training: Train custom models on hospital-specific formats
Webhook Support: Notify external systems on completion

Security Considerations

PHI Protection: All documents contain PHI
Encryption: Encrypt files at rest and in transit
Access Control: Limit who can upload documents
Audit Trail: Log all document access
Retention Policy: Auto-delete after processing

Conclusion

The FHIR ingestion system represents a significant technical achievement in automating medical data extraction. While the current implementation provides a solid foundation, production deployment requires addressing the security, performance, and reliability considerations outlined above. The system's modular design allows for easy enhancement and adaptation to specific healthcare provider needs.

Overview​

Architecture​

Core Components​

1. Document Upload Handler​

2. AWS Processing Pipeline​

3. OpenAI Fallback Processing​

4. FHIR Converter​

5. Enhanced FHIR Service​

Data Flow​

1. Upload Process​

2. Extraction Process​

3. Conversion Process​

4. Storage Process​

Configuration​

AWS Configuration​

OpenAI Configuration​

Error Handling​

1. Upload Errors​

2. Processing Errors​

3. FHIR Validation Errors​

Performance Optimization​

1. Caching Strategy​

2. Parallel Processing​

3. Streaming Large Files​

Testing​

Unit Tests​

Integration Tests​

Known Issues​

Future Enhancements​

Security Considerations​

Conclusion​