MRF Dictionary Calculation and Visualization using GPU Compute Shaders
Andrew Dupuis1,2, Dan Ma3, and Mark A Griswold1,2,3

1Biomedical Engineering, Case Western Reserve University, Cleveland, OH, United States, 2Interactive Commons, Case Western Reserve University, Cleveland, OH, United States, 3Department of Radiology, School of Medicine, Case Western Reserve University, Cleveland, OH, United States


Dictionary generation for Magnetic Resonance Fingerprinting (MRF) can be a computationally intensive procedure, especially as complexity and density increase. Conveniently, the majority of operations required for calculating dictionary entries are already enumerated in conventional computer graphics shader packages. Here, we leverage the decades of research and hardware development spent to improve computer graphics optimization to remove the need for CUDA parallelization and instead directly render MRF dictionaries into compressible video files in virtually real time.


The need for rapid generation of dictionaries for Magnetic Resonance Fingerprinting (MRF) [1] is significant. With the rise in interest in “realtime” MRF and the corresponding increased need for patient-specific dictionaries, for example in cardiac gated exams, opportunities to rapidly generate dictionaries while at the scanner have gained interest. Much existing research has been put into optimization of the dictionary generation process in Matlab or other research software packages. However, these packages have significant overhead, and many optimization approaches require CUDA-capable hardware. Here we instead chose to focus on a universal GPU-accelerated dictionary generation technique using standard computer graphics (CG) shaders to perform the simulation processing, theoretically allowing any computer with even basic rendering capabilities to benefit from the acceleration techniques.


Development of the system was performed in the Unity Engine to allow for rapid shader iteration and debugging, but the shaders developed are platform agnostic and can be used in either DirectX or OpenGL-based implementations.

A compute shader is used to perform the dictionary simulation. Compute shaders are implementations of the standard CG language functions in a runtime format that allows for additional datatype and dispatching flexibility. The dictionary to be rendered is defined in terms of minimum and maximum T1 and T2 values, as well as resolution percentages to be used for each “step” in the dictionary. Rather than using a constant step size/resolution across the whole dictionary, the resolution at any point is based on a geometric progression, with a constant percentage change between entries in the dictionary. Therefore, a percentile resolution increases result in exponential growth of total dictionary entries. Together, these variables define the resolution of the rendered output. Additional inputs include text files defining the flip angle, phase and TR for the dictionary.

The T1 and T2 space is divided amongst an array of compute groups to parallelize the computation. An RGB texture is initialized that matches the resolution of the dictionary in T1/T2 space, and initial magnetizations are set. Standard Bloch simulation then proceeds for one timestep, with the magnetization at the end of the RF pulse and at TE is rendered to textures. Additionally, the real and imaginary components of the magnetization are rendered to the red and green components of the master dictionary texture. At each timestep, the previous timestep’s magnetizations are used as inputs, and an additional frame is added to the dictionary rendering.

This process proceeds for as many frames/timesteps as desired, with the final output being a video file containing the dictionary simulation’s values.


Dictionary generation speed was tested with an isochromat simulation over 1000 TRs on a T1 range from 20:4000ms and a T2 range from 2:400ms on a Windows 10 PC with a Xeon E5-2697 CPU, 64Gb of RAM, and an Nvidia 1080Ti GPU. The percent step size of T1 and T2 was varied. Complete timing comparisons at various dictionary sizes are visible in Figure 2. Average simulation time reduction was 96.4% across all tested dictionary resolutions.


The speed benefits of the preliminary work presented are not insignificant, indicating that further shader optimization may yield even greater performance gains. Input-output latency is the primary limitation of the system in its current state, with GPU-> CPU transfer speeds limiting the ability of the shader system to freerun as quickly as it should. However, this limitation is primarily in implementation rather than theory.

Future work would allow for calculation of the dictionary as either an isochromat or as a group of spins with variable spin numbers and time steps. Modifications should also be made to the work assignment system to ignore impossible combinations of T1 and T2 values such as where T2 is longer than T1.

With a render-based approach, dictionaries are not calculated into arrays but are instead rendered into RGB video files. While this is a limitation when used with current reconstruction systems, requiring conversion into a more traditional format that reduces the overall performance gain, the new storage approach introduces opportunities in dictionary evaluation and compression. First, dictionaries can be “watched” along the TR dimension, with clear visualization of magnetization at different T1 and T2 combinations. This allows an MRF sequence designer to see the effects of their sequence, and to understand the effects of their sequence on the T1-T2 domain. Second, the introduction of video files as a storage medium opens paths to integration of existing compression and streaming research, allowing for storage or in-situ streaming of high resolution dictionaries without the need for massive uncompressed fully sampled dictionaries.


Siemens Healthcare, R01EB018108, NSF 1563805, R01DK098503, and R01HL094557.


[1] Ma, D. Gulani, V., Seiberlich, N., Liu, K., Sunshine, J., Duerk, J. and Griswold, M.A. (2013). Magnetic Resonance Fingerprinting. Nature. 2013 Mar 14. 495(7440): 187-192.


Figure 1: (click for animation) A rendered dictionary depicting the magnitude of the real (red, values too low to be visible) and imaginary (green) components of a calculated dictionary. The TR domain is played back in time via the GIF. T1 entries span from 20 to 4000ms from bottom to top incremented by 5% of the prior value. T2 entries similarly span from 2 to 400ms by 5% from left to right. Note the pattern of null values in the region where T2 exceeds T1, as would be expected.

Figure 2: Performance data for isochromat simulation over 1000 TRs on a T1 range from 20:4000ms and a T2 range from 2:400ms. The percent step size of T1 and T2 was varied.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)