DEViCE: Data Extraction from Visual Capture for Examination

Part 1: The Clinical Problem

At my workplace, a peculiar problem exists: Essential medical devices like ventilators and mechanical circulatory support systems often lack a direct data link to the electronic health record (EHR), despite some modern machines and models having this capability. This forces healthcare providers (primarily nurses and respiratory therapists) to manually write down critical patient data onto paper before typing it into the computer. Although some attempt to chart directly, the process is difficult due to small on-screen text and challenging room layouts. This repeated cycle of writing and transcribing, known as double-handling, significantly increases the chance of error, especially with poor handwriting. This practice of double-handling, the act of manually entering or processing the same vital information multiple times across different systems, is a primary source of inefficiency and risk. It creates data silos, increases the likelihood of human errors like miswriting or misreading figures, and ultimately slows down patient care operations, emphasizing the urgent need for automation or better integration.

To solve this, I developed an iOS app called DEViCE, which stands for Data Extraction from Visual Capture for Examination. The main goal is simple: Eliminate human error and delay in recording vital data from these devices. Initially, this project focuses on Impella devices. However, as the data extraction process from photos is refined, I plan to incorporate more device types.

An Impella is a specialized cardiac support device. Its monitor displays critical information, such as pump level, supported cardiac output, Impella flow, hemodynamics, and purge infusion information, which must be manually copied to the patient's chart every hour. My objective was to create an app that could instantly read a photo of this monitor, saving valuable time and ensuring accuracy. The app's core function, initially centered on the Impella cardiac support device, is to use the Vision Framework's Optical Character Recognition (OCR) to automatically extract key data points—such as P-Level, Flow, Placement Signals, Purge Pressure/Rate, Cardiac Output, Cardiac Power Output, and Motor Current—directly from a photograph of the device screen.

Part 2: Initial Failure: Geometry, Shifting Images, and the Overlay Gambit

The initial foray into developing this system began with an attempt to leverage the built-in Vision framework for Optical Character Recognition (OCR). However, that approach was plagued by reliability issues, primarily stemming from the camera's alignment, which frequently led to misreading the critical information. To counteract this, a grid overlay was introduced as a user aid, intended to keep the camera precisely aligned; while it offered some marginal improvement, it wasn't a definitive fix.

Part 3: The Pivot to Complexity: Synthetic Data and Machine Learning

This prompted a significant pivot, exploring more complex, external solutions. Experiments were conducted with CVAT and CreateML, and later with Tesseract OCR. This investigative phase required the introduction of Python scripting and the creation of numerous test images on a blank Impella layer, purely for the purpose of generating training data for CVAT. Ultimately, these solutions proved to be overly convoluted, adding unnecessary complexity to the entire development process.

Part 4: Returning to the Roots: Refining Simple OCR for Practical Accuracy

Recognizing the need for a simpler, more robust core, the development returned to its roots, starting from scratch with a renewed focus on refining the native Vision OCR. The key to unlocking greater accuracy was surprisingly simple yet effective: instructing the user to switch the camera to landscape orientation when capturing the photo, which allowed the screen containing the data to fully occupy the frame. Furthermore, a crucial realization was that 'less is more' regarding the captured image content. For instance, including the entire screen, particularly the top section with dates and serial numbers, flooded the system with extraneous numerical data, which severely hindered the system's ability to accurately parse the essential information.

The system, while not flawless, has now achieved a respectable accuracy rate of approximately 80%. Critically, for the remaining 20% of cases where incorrect data is extracted, a built-in verification step allows the user to manually edit the fields before saving, ensuring data integrity.

Part 5: The Result: Automation Plus Integrity

The journey of developing DEViCE has been defined by iteration, setbacks, and ultimately, practical success. We initially chased the promise of perfect computer vision and machine learning, only to find that the simplest, most human-centered refinements provided the most immediate value. The system, relying on streamlined native OCR, landscape orientation, and focused image capture, now reliably extracts the necessary data with approximately 80% accuracy.

This accomplishment is more than just a technical milestone; it is a tangible improvement in clinical workflow, turning a multi-minute, error-prone transcription task into a quick, verified snap-and-save process. By integrating a mandatory manual verification and editing step for the remaining 20% of cases, we ensure data integrity is never compromised, proving that effective medical technology often lies in the synergy between smart automation and intuitive human oversight.

The DEViCE project is still ongoing, and I remain committed to the long-term goal of developing a more robust, machine-learning-driven solution that can handle an even greater variety of devices and lighting conditions. However, after eight months of intensive development, I am incredibly pleased with what has been produced. We have built a functioning application that eliminates the risks of double-handling and transcription errors today, creating a tested and valuable foundation for the advanced machine learning initiatives of tomorrow.