Description

Introduction
Framework
Evaluation
Data format
Submissions
Results on website

Introduction

Reliably obtaining a carotid bifurcation, lumen segmentation, and stenosis grading from computed tomography angiography data is relevant in clinical practice. This evaluation framework provides a large-scale standardized evaluation methodology and reference database for the quantitative evaluation of carotid bifurcation, lumen segmentation, and stenosis grading algorithms. Using this framework different methods can be compared in a objective, standardized way.

Well-defined measures are presented and a multi-site multi-vendor database containing 56 carotid CTA datasets with corresponding reference standard is described and made available, and different methods are available to extract statistics from the evaluation results.

Using this framework is simple, just follow the following recipe:

  1. Register as a team and send in the signed data confidentiality form
  2. Download the datasets
  3. After processing of the datasets, upload the resulting segmentations and/or stenosis gradings,
  4. The evaluation measures will be determined automatically with software running on the website, using the reference standard that was obtained by averaging the results of three human observers.
  5. If your submission was successfull and there are no error, send an email to the organizers to confirm your submission.
  6. After confirmation of the processing, the results will be viewable for you.
  7. To make the results visible for everyone, write a paper about your method and send it to the orginizers or publish it in a journal and send us a link to it. Or if you are a commercial compagny give a description of how you obtained the results. (details below)
Currently 11 carotid bifurcation lumen segmentation and 4 stenosis grading methods have been evaluated with the framework.

Evaluation framework

More details about the evaluation framework (data, reference standard, measures, scores and ranking) can be found below and in this document.

Framework test

The CLS 2009 framework was tested during one of the challenges of the 3rd MICCAI Workshop in the series "3D Segmentation in the Clinic: a Grand Challenge III", which was held on 24 September 2009 at the 12th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). Proceedings of this workshop can be found at the midas journal website. The framework is open for new submissions.

Framework use

The framework can be used to evaluate methods that perform:

  • Carotid bifurcation lumen segmentation
  • Internal carotid artery stenosis grading

Each team can participate in either one of the tasks, or in both. This page briefly describes respectively the tasks to be performed, the data used, the manual annotation, the reference standard and the evaluation criteria.

Lumen Segmentation

The Common Carotid Artery (CCA) and Internal Carotid Artery (ICA), see Fig. 1, are clinically the most relevant arteries of the Carotid Bifurcation. Therefore, the segmentation evaluation focuses on these two arteries. A small part of the External Carotid Artery (ECA) is also included, to prevent evaluation issues at the location where the ECA bifurcates from the ICA. Additionally, it allows us to include a complete bifurcation in the evaluation.

The goal of this category is to accurately segment the lumen of the Carotid Bifurcation in a Computed Tomography Angiography (CTA) dataset. There are two versions: a fully automated version, and a semi-automated version where three initial points are provided.

The region to be segmented is defined around the bifurcation slice, which we define as the first (caudal to cranial) slice where the lumen of the CCA appears as two separate lumens: the lumen of the ICA and the lumen of the ECA. The segmentation must contain the CCA, starting at least 20 mm caudal of bifurcation slice, the ICA, up to at least 40 mm cranial of bifurcation slice, and the ECA, up to between 10 and 20 mm cranial of the bifurcation slice, see also Fig. 1.


Figure 1: Schematic depiction of the relevant region of interest, and a rendering of this region for one of the datasets.

The performance measures are only determined over the region of interest as specified above. However, the bifurcation slice is not communicated to the participant. Therefore, the participants should make sure that there segmentation at least includes this region. Our definition of the bifurcation slice, and the specified regions, should be sufficient to determine a suitable region of interest for the segmentations.

For the External Carotid Artery, the segmented lumen should be cut between 10 and 20 mm cranial of the bifurcation slice. To allow for some flexibility in cutting of the ECA, the region around the ECA between 10 and 20 mm cranial of the bifurcation slice is a "masked" region, where the evaluation measures will not be evaluated, see also Fig. 1.

The input for the participant is:

  • the CTA dataset (including header information such as voxel sizes and world coor dinate system), and
  • three points if you join the semi-automatic method category:
    1. a point in the Common Carotid Artery, at the level of the cranial side of the thyroid gland
    2. a point in the Internal Carotid Artery, just before the artery enters the skull base
    3. a point in the External Carotid Artery, where the artery is close to the mandible
The participant will be asked to return the segmented lumen. This segmentation must be respresented as a (partial volume) segmentation, i.e. an image with floating point numbers, where each voxel value contains the occupancy of the voxel by the vessel lumen, where a value of 0 means no lumen present, and a value of 1 means fully occupied with lumen. The voxel value must thus be in the range [0,1].

Stenosis Grading

Two different stenosis grades have to be determined for each ICA that needs to be segmented. We use the following NASCET-like definitions for stenosis grading:
  (1)
  (2)
where Sa is an area-based stenosis grade, and Sd is a diameter-based stenosis grade. The stenosis grade is a value in the range [0 . . . 100], where 0 implies no stenosis, and 100 implies a fully occluded vessel.

In the above formulations, am is the minimal cross-sectional area along the CCA and ICA, and ar is the average cross-sectional area over a distal reference part of the Internal Carotid Artery. The default reference part has a length of 10 mm, and is 20 mm distal of the location of minimal area measured along the vessel centerline. However, observers are free to change the location and length of the reference area, with the restriction that it must be distal to the minimal area location, and not extend outside the segmented region, i.e. beyond 40 mm cranial of the bifurcation plane.


Figure 2: Minimal diameter lines for various cross-sectional contours.

The second stenosis grade is determined using minimal diameters. The minimal diameter of a cross-section is defined as the shortest straight line that divides the contour in two equal-sized areas, see Fig. 2 for examples of minimal diameters for various contour shapes.

Similar to the lumen segmentation task, there are two versions of the stenosis grading task: a fully automated version, where the stenosis grading uses only the CTA dataset and the specification whether the left or right side needs to be graded, and a semi-automated version, where the algorithm also may use the three points in each of the arteries of the bifurcation (as supplied with the data). The input data available for this task is identical to the data for the lumen segmentation task.

Evaluation measures and ranking

Lumen segmentation

The partial volume lumen segmentations will be evaluated using the following four performance measures:

  1. The Dice similarity index Dsi:
      (3)
    where pv r and pv p are the reference and a participants partial volumes, the intersection operation is the voxelwise minimum operation, and |.| is the volume, i.e. the integration of the voxel values over the complete image.
  2. the mean surface distance Dmsd:   (4)
    where sdmr and sdmp are the signed distance maps of the reference and a participants segmentation, and Sr and Sp are the lumen boundary surfaces (isosurfaces of the signed distance map at the value 0), and |Si| is the surface area of surface Si, i.e. |Si| = Si ds.
  3. Hausdorff surface distance: DHausdorff:   (5)
  4. Maximum surface distance Dmax:   (6)

All distance measures are symmetric, and all these measures are only evaluated in a the region of interest that is specified in 2.1. Furthermore, the mask for the distal part of the ECA is also used in all the above measures. The measures above will lead to one performance value of a participants for each dataset and for each performance measure. Per dataset and per performnce measure a ranking of the participants will be made, i.e. with N datasets, and the 4 measures, N * 4 rankings will be obtained. The final ranking for a participant is obtained by averaging the ranks of all these N * 4 rankings.

Stenosis grading

The evaluation of the stenosis grade is straightforward: the absolute difference between the reference standard value and the value determined by a participant is the error in stenosis grade. As revealing the (exact) error per dataset also more or less reveals the reference stenosis grades, the stenosis errors are not communicated per dataset, but only per ensemble (testing or on-site). The same holds for the ranking. The final ranking, however, is determined by averaging the (hidden) errors per dataset and stenosis grade (diameter and area).

Data format

Information on the data format and submission format can be found in this document.

Submissions

The website can be used to upload processed data. Use the submit button on the Download/Submit page to upload processed data. Uploaded data should be in the format as described in this document: one subdirectory per challenge, named according to the input cta data, and the appropriate data files (roi and partial volume for lumen, and area- and diameter stenosis for stenosis grading). Next to adhering to the directory structure and file naming conventions, also note the following:

  • only submit a complete set of data (complete training, complete testing, or both)
  • specify which data you submitted (lumen and/or stenosis),
  • specify whether you join the automated or the semi-automated competition,
  • for your convenience, you can also attach a name to your submission,
  • for uploaded archives, we accept rar, zip and tgz formats.

Checking your submission

The processing of the submitted data contains some very basic checks on the input data, such as checking whether all files are present and whether your segmentation overlaps with the region of interest of the reference standard. This check is for all possible input datasets, both training and testing. After processing (which is usually finised in half an hour for lumen and a few minutes for stenosis submissions, but on heavy loads can take a few hours), your submissions can be viewed by clicking on the submissions button on the Download/Submit page.

For each submission, the number of errors is listed. As we try to process all possible datasets, you will get errors on the training data if you only submitted test data, and vice versa. These errors should be ignored, you should only check whether the data you submitted was successfully processed. If an error has been detected, you can either upload a new set of data, or mark the dataset as failed. If you upload a new set of data, make it a complete set again, as their are no ways to combine submission results. Marking a dataset as failed will always rank that dataset as worst, but it will not take the performance measures into account when averaging the performance measures over all datasets. If you do not mark your dataset as failed, it will get default values for the performance measures, which will be much worse then your average performance. Note that, as the final ordering is on the ranks, marking as failed does not affect the ranking of your method.

For the training data (if submitted), you can inspect the performance measures immediately after processing is finished. They should be similar to the values that result from applying the evaluation software provided (contact the organizers if you detect large differences!).

For the testing data, no performance measures are shown, you can only see the number of errors detected.

Confirming your submission

If you are confident that the testing data submitted is fine, you can confirm your submission by sending an e-mail to cls2009.bigr.nl, with subject "cls2009 submission confirmation", and in the body clearly state your team name and which submission you want to confirm, by providing the data/time and/or label of the submission, and whether you confirm the lumen segmentation or the stenosis grading or both. Note that you can confirm only once, after confirmation you can NOT upload a new set of data and get it confirmed.

Viewing your results

After our confirmation by e-mail of your confirmation, you can view your results. We also submitted the results of the three observers (ObserverA is the best observer, ObserverC is the worst observer), and show these results together with your results. This shows how you perform w.r.t. our manual observers.

For the lumen segmentation, both average values and a complete list of performance measures for each dataset are provided. For the stenosis grading, only the aggregate values are given.

The tables can be sorted by clicking on the header of a column. If you want to sort the table with detailed lumen scores, first click on the header for sorting some measure, and then click on the header of the column with dataset ids, to regroup the rows on dataset.

Results on website

If you want to include your results in a publication and want them to be visible for everyone, You have to send us a copy of the paper or a link to it and we will make the results viewable for everyone. If you are a commercial company and do not want to disclose your method, you should provide us with the exact software version number and a precise description of how the results are obtained.