Dynamic CT Perfusion Image Data Compression for Efficient Parallel Processing

The increasing size of medical imaging data, in particular time series such as CT perfusion (CTP), requires new and fast approaches to deliver timely results for acute care. Cloud architectures based on graphics processing units (GPUs) can provide the processing required for delivering fast results. However, the size of CTP datasets makes transfers to cloud infrastructures time consuming and therefore not suitable in acute situations. To reduce this transfer time, this work proposes a fast and lossless compression algorithm for CTP data. The algorithm exploits redundancies in the temporal dimension and was designed to keep random read-only access to the image elements directly from the compressed data on the GPU. The algorithm also enables faster transfers of CTP data to GPUs and it can speed up the GPU processing in 30%. Regarding the transfer to cloud architectures, our method is up to 6 times faster than the compression techniques adopted in the DICOM standard. To the best of our knowledge, this is the first work to present a GPU-ready method for medical image compression with random access to the image elements from the compressed data.

context

The datasets used in this study consist of 20 whole-brain volumes with 320 slices of 512×512 voxels with 16 bits/voxel per patient. The patients were scanned as part of a Dutch multi-center randomized trial named MR CLEAN. Approval of the medical ethical committee was obtained. All patients or legal representatives signed informed consent.  The volumes are acquired every 2 seconds during the first 35 seconds, followed by a scan every 5 seconds until 60 seconds. Subsequently 5 volumes are scanned with a 30 seconds interval. The size of each volume is 160 MB, thus the complete dataset has 3840 MB of data that need to be quickly processed for an initial evaluation of the patient. Sometimes, an additional CTP dataset is produced to evaluate the treatment progress after around 24 hours, resulting in up to 7.5GB of data per patient.

Voxel intensities are represented using 16 bits. However, the range of voxel values over time is smaller than the range that can be represented by 16 bits. Therefore, fewer bits can be used to represent exactly the same information by storing the variation of these intensities instead of their absolute values. As observed in the next figure, 6% of the voxels in that slice require more than 8 bits to represent their intensities variation over time, and a maximum of 11 bits is required to represent this variation. Figure 3 also illustrates that, due to the characteristics of the CTP data, the number of voxels that have a large intensities variation over time is rather small. This indicates that the temporal dimension of the CTP data is a significant source of redundancies that can be exploited for compression purposes.

data

Our compression technique exploits these redundancies. Thus:

algorithm

All the compression techniques that are incorporated in the DICOM format were selected for comparison with our method. However, according to the DICOM specification, MPEG2 and MPEG-4 compressions are inherently lossy and, for this reason, they were excluded of our comparison. JPEG 2000 lossless was also excluded from our comparison because it is much slower than the other methods, without a significant better compression ratio. Consequently, only the following techniques from the DICOM standard remain in our comparison:

  • JPEG lossless, more precisely the JPEG process 14 (first-order horizontal prediction [selection value 1], DPCM, non-hierarchical with Huffman coding);
  • JPEG LS lossless; and
  • run length encoding (RLE)

results 1

The number of processing units used to execute our compression method has a significant impact in its compression time. This characteristic is  in the next figure.

results 2

Also, our compression technique enables GPU processing directly from the compressed data. By processing the compressed CTP data, less data need to be transferred between host and GPU. This feature can speed up the total GPU processing time significantly because, in some applications, most of the time in a GPU computation is spent on data transfers. In order to evaluate the GPU processing time improvement, a GPU application that creates a mask from the CTP data was developed. This GPU application was executed in two different ways: using the uncompressed data and using the compressed data generated by our method. In both ways, the time to compute the mask is measured including the time spent by the transfers between host and GPU. The processing time using the original and the compressed data took 2818 ± 382 [2664, 4392] and 1903 ± 186 [1712, 2668] milliseconds respectively. Accordingly to these results, the GPU processing using the compressed data was, on average, more than 30% faster than the processing of the original data.

This work has been funded by ITEA2 10004: Medical Distributed Utilization of Services & Applications (MEDUSA)