Quantitative Assessment of the Whole Spine T2 MRI Using Deep Learning

June 7, 2023


We present a fully automatic system for the quantitative assessment of discs and vertebrae using convolutional neural networks. The proposed algorithm works in three stages: 1) segmentation/identification of spinal anatomy; 2) curvature analysis; and 3) detection of pathological conditions of intervertebral discs. We validate the proposed approach on a large dataset of 1,500 subjects with sagittal T2-weighted whole spine MRI, obtained as part of a whole body MRI protocol in a preventative health screening program.


We present an AI-based solution to: 1) automatically segment the spine; 2) classify the curvature; and 3) detect spondylosis in intervertebral discs. We validate this solution on 1,500 patients and find the solution accurate for identifying pathological cases in cervical and lumbar regions.


Spine curvature disorders and spondylosis are common occurrences among adults of all ages [1]. Visual analysis of the spine in Magnetic Resonance Images (MRI) plays a significant role in the assessment/diagnosis of spinal disorders. However, MRI acquisition is time-consuming and generating radiology reports is expensive. We address the former by obtaining sagittal T2-weighted whole-body MRI protocol in a proactive health screening program. We developed AI-based solutions tailored to investigating the most common spinal issues in a time-conscious manner. We tackle the latter by automating key analytical time-consuming aspects of spinal MRI interpretation in order to improve radiologist read-time efficiency.

Material and Methods:

We trained an nnU-Net [2] model on sagittal T2 volumes acquired from 80 patients to segment cervical, thoracic, lumbar and sacral vertebrae, intervertebral discs (IVDs), ribs, cerebrospinal Fluid (CSF) and spinal cord. 

Accurate identification of vertebrae is extremely important in order to prevent spinal surgeries at the wrong level [3]. To identify vertebrae and IVDs, we perform morphological operations to remove small artifacts and separate objects. Then, we sort anatomical objects in the superior-anterior direction and start labeling from the C2 vertebrae until S2 counting six cervical, twelve thoracic, five lumbar and two sacral vertebrae. In order to evaluate the accuracy of the segmentation model we used the Dice metric, which is twice the area of intersection between predicted and target anatomies divided by the sum of the size of predicted and target regions.

The segmentation mask of vertebrae is used to perform curvature analysis to identify normal, reversal, straight and accentuated cases. To this end, a decision tree is trained on the coefficients of a B-spline curve fitted to vertebrae centroids. 

The segmentation mask of IVDs is used to extract a local 3D patch around each IVD. This patch was used to train a shallow convolutional classification model as illustrated in Figure 1, in order to distinguish between healthy and spondylosis discs. 


It takes ~7 minutes to process a single T2 volume on a g4dn.xlarge AWS instance using our approach. The majority of time (~5 minutes) is spent in the nnU-Net segmentation module. The mean Dice across all discs and vertebrae is 0.87 ± 0.02 and 0.85 ± 0.06, respectively which is accurate for automatically annotating anatomy in our whole-spine MRI. 

Results from the curvature classification model are seen in Figure 2, which shows that the decision tree can accurately separate between normal and curvature pathologies. Note that normal curvature dominates pathological curvatures. This is because radiologists underreport mild curvature deviations when examining the spine in MRI.

To quantify the accuracy of the classification models, as seen in Table 1, we calculate the sensitivity and specificity of our model for detecting spondylosis vs. healthy cases on a hold-out set of 1,500 patients. The average sensitivity for the cervical, thoracic and lumbar regions is 0.74 ± 0.28, 0.27± 0.21 and 0.88 ± 0.06, respectively. The average specificity for the aforementioned regions is 0.59 ± 0.27, 0.90 ± 0.07 and 0.62± 0.20, respectively.


We presented a fully automatic system for the quantitative assessment of discs and vertebrae. Figure 3 shows an example of an annotated T2 image with overlaid information from the proposed system. We acknowledge that the reported sensitivity is low in the thoracic region but this might be due to the highly imbalanced nature of our data. Since the prevalence of spondylosis cases is extremely rare among some thoracic discs in our cohort. For example, at T4 there are only five spondylosis cases among 483 samples (~1%).  

Upon inspection of failure cases and consulting with collaborating radiologists, we noticed that both in the curvature and the spondylosis classification model, failure cases lay in-between healthy and pathological classes, in which the assignment to either class is highly subjective even among radiologists. 


The proposed pipeline is a powerful tool for our collaborating radiologists for counting vertebrae and highlighting potential spondylosis issues. However, current sensitivity and specificity values are not sufficiently high to fully automate spondylosis detection.

In the future we plan to increase the accuracy of our spondylosis predictions by increasing the training dataset size and using more sophisticated classification models.

Figure 1: Top row: The building blocks used in the binary classifier: left) 3D convolution followed by Relu activation; middle) 3D Pooling layer for extracting global context to a more concise representation; right) Fully connected layer followed by the softmax function for binary classification.

Bottom row: The architecture of the proposed binary classifier. The input to the model is a grayscale 16x32x64 volume centered at the disc of interest. This volume is processed using a series of convolutional blocks. These convolutional blocks are defined using input channels (i), output channels (o), kernel size (k) and pooling (p). Note while the number of channels grows as the input is processed, its spatial resolution decreases. At the bottleneck, average pooling over channels is used to flatten the volume to a vector. Finally a fully connected layer followed by softmax is used to generate the output classification.

Figure 2: From left to right, the results of curvature classification against ground truth labels reported by radiologists in cervical, thoracic and lumbar spinal regions. In all regions, the B-spline curvature correctly distinguishes between different conditions.

Table 1: Sensitivity and specificity values for spondylosis detection of various discs. Discs at C7, T1, T3 and T11 are missing because our cohort did not include spondylosis cases at these levels.

Figure 3: Sample sagittal annotated T2 MRI slice overlaid with vertebrae, discs, curvature and spondylosis predictions generated using the proposed AI solution.


[1] Taichi Tsuji, Yukihiro Matsuyama, Koji Sato, Yukiharu Hasegawa, Yu Yimin, and Hisashi Iwata. Epidemiologyof low back pain in the elderly: correlation withlumbar lordosis. Journal of orthopaedic science, 6(4):307–311,2001

[2] Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2020). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 1-9.

[3] Shah, Manan, et al. "Anatomical variations that can lead to spine surgery at the wrong level: part II thoracic spine." Cureus 12.6 (2020).

Share this

Stay updated on our
latests findings and research

a profile photo a of a man thinking
Open modal