Johannes Schmidt, Nordakademie, Germany
Arne Ewald, Nordakademie, Germany
This paper evaluates the ability of Large Language Models (LLMs) to generate syntactically and semantically correct FHIR REST queries from natural language for retrieving medical data from Clinical Data Repositories (CDRs). The goal is to explore natural language interfaces that can improve clinical data access and interoperability across healthcare systems. Six experiments were conducted with nine LLMs, comparing baseline prompting against structured prompts, few-shot examples, and feedback-loops using HTTP error codes or messages. Results show that even without external tools, several models achieve high syntactic validity, with accuracy further improved by prompt-engineering and simple feedback mechanisms. However, semantic correctness remains challenging, in particular for medical codes, date logic, and site-specific conventions. Error analyses demonstrate where Retrieval-Augmented Generation (RAG), terminology services, and agentic repair could provide immediate gains, making this work a valuable prompt-centric baseline for the next generation of tool-augmented clinical query systems.