About
|
Introduction
Reliably obtaining a carotid bifurcation, lumen segmentation,
and stenosis grading from computed tomography angiography
data is relevant in clinical practice. Whereas numerous methods
have been presented for lumen segmentation,
up to now no standardized evaluation methodology has been
published to reliably evaluate
and compare the performance of the existing or newly developed
carotid bifurcation, lumen segmentation, and stenosis grading
algorithms.
This evaluation framework provides a large-scale standardized
evaluation methodology and reference database for the quantitative
evaluation of carotid bifurcation, lumen segmentation, and
stenosis grading algorithms.
Well-defined measures are presented and a multisite multi vendor database containing 56
carotid CTA datasets with corresponding reference standard is
described and made available, and different methods are available
to extract statistics from the evaluation results.
Using this framework is simple, just follow the following recipe:
- Register as a team and send in the data confidentiality form
- Download the datasets
- After processing of the datasets, upload the resulting segmentations
and/or stenosis gradings,
- The evaluation measures will be
determined automatically with software running on the website, using the reference standard that was
- After confirmation of the processing, the results will be available.
determined by averaging the results of three human observers.
Currently
8 carotid bifurcation lumen segmentation and
3 stenosis grading methods have been evaluated with the framework.
Framework test
The CLS 2009 framework was tested during one of the challenges of the
3rd MICCAI Workshop in the series
"3D Segmentation
in the Clinic: a Grand Challenge III",
which was held on 24 September 2009 at the 12th International
Conference on Medical Image Computing and Computer Assisted
Intervention (MICCAI). Proceedings of this workshop can be
found at the midas journal website.
The framework is open for new submissions.
Evaluation framework
More details about the evaluation framework (data, reference standard,
measures, scores and ranking) can be found below and
in this document.
Framework use
The framework can be used to evaluate methods that perform:
-
Carotid bifurcation lumen segmentation
-
Internal carotid artery stenosis grading
Each team can participate in either one of the tasks,
or in both. This page briefly describes respectively the
tasks to be performed, the data used,
the manual annotation, the reference standard and the
evaluation criteria.
Lumen Segmentation
The Common Carotid Artery (CCA) and Internal Carotid Artery
(ICA), see Fig. 1, are clinically the most relevant arteries
of the Carotid Bifurcation. Therefore, the segmentation
evaluation focuses on these two arteries. A small part of the
External Carotid Artery (ECA) is also included, to prevent
evaluation issues at the location where the ECA bifurcates
from the ICA. Additionally, it allows us to include a complete
bifurcation in the evaluation.
The goal of this category is to
accurately segment the lumen of the Carotid Bifurcation in a
Computed Tomography Angiography (CTA) dataset. There are two
versions: a fully automated version, and a
semi-automated version where three initial points are provided.
The region to be segmented is defined around the bifurcation
slice, which we define as the first (caudal to cranial) slice
where the lumen of the CCA appears as two separate lumens:
the lumen of the ICA and the lumen of the ECA. The segmentation
must contain the CCA, starting at least 20 mm caudal of
bifurcation slice, the ICA, up to at least 40 mm cranial of
bifurcation slice, and the ECA, up to between 10 and 20 mm
cranial of the bifurcation slice, see also Fig. 1.
 
Figure 1: Schematic depiction of the relevant region of interest, and a rendering of this
region for one of the datasets.
The performance measures are only determined over the region of
interest as specified above. However, the bifurcation slice is
not communicated to the participant.
Therefore, the participants should make sure that there
segmentation at least includes this region. Our definition of
the bifurcation slice, and the specified regions, should be
sufficient to determine a suitable region of interest for the
segmentations.
For the External Carotid Artery, the segmented
lumen should be cut between 10 and 20 mm cranial of the
bifurcation slice. To allow for some flexibility in cutting of
the ECA, the region around the ECA between 10 and 20 mm cranial
of the bifurcation slice is a "masked" region, where the
evaluation measures will not be evaluated, see also Fig. 1.
The input for the participant is:
-
the CTA dataset (including header information such as voxel sizes and world coor
dinate system), and
-
three points if you join the semi-automatic method category:
-
a point in the Common Carotid Artery, at the level of the
cranial side of the thyroid gland
-
a point in the Internal Carotid Artery, just before the
artery enters the skull base
-
a point in the External Carotid
Artery, where the artery is close to the mandible
The participant will be asked to return the segmented lumen.
This segmentation
must be respresented as a (partial volume) segmentation, i.e.
an image with floating point numbers, where each voxel value
contains the occupancy of the voxel by the vessel lumen, where
a value of 0 means no lumen present, and a value of 1 means
fully occupied with lumen. The voxel value must thus be in the
range [0,1].
Stenosis Grading
Two different stenosis grades have to be determined for each
ICA that needs to be segmented.
We use the following NASCET-like definitions for stenosis
grading:
(1)
(2)
where Sa is an area-based stenosis grade,
and Sd is a
diameter-based stenosis grade. The stenosis grade is a value in
the range [0 . . . 100], where 0 implies no stenosis, and 100
implies a fully occluded vessel.
In the above formulations, am
is the minimal cross-sectional area along the CCA and ICA, and
ar is the average cross-sectional area over
a distal reference part of the
Internal Carotid Artery. The default reference part has a length
of 10 mm, and is 20 mm distal of the location of minimal area
measured along the vessel centerline. However, observers are
free to change the location and length of the reference area,
with the restriction that it must be distal to the minimal area
location, and not extend outside the segmented region, i.e.
beyond 40 mm cranial of the bifurcation plane.
Figure 2: Minimal diameter lines for various
cross-sectional contours.
The second
stenosis grade is determined using minimal diameters. The
minimal diameter of a cross-section is defined as the shortest
straight line that divides the contour in two equal-sized areas,
see Fig. 2 for examples of minimal diameters for various contour
shapes.
Similar to the lumen segmentation task, there are
two versions of the stenosis grading task: a fully automated
version, where the stenosis grading uses only the CTA dataset and
the specification whether the left or right side needs to be
graded, and a semi-automated version, where the algorithm also
may use the three points in each of the arteries of the
bifurcation (as supplied with the data). The input data available
for this task is identical to the data for the lumen
segmentation task.
Evaluation measures and ranking
Lumen segmentation
The partial volume lumen segmentations will be evaluated using
the following four performance measures:
-
The Dice similarity index Dsi:
(3)
where pv r and pv p are the reference and a participants partial
volumes, the intersection operation is the voxelwise minimum
operation, and |.| is the volume, i.e. the integration of the
voxel values over the complete image.
-
the mean surface distance Dmsd:
(4)
where sdmr and sdmp are the signed distance maps of the reference
and a participants segmentation, and Sr and
Sp are the lumen
boundary surfaces (isosurfaces of the signed distance map at the
value 0), and |Si| is the surface area of surface Si,
i.e. |Si| = Si ds.
-
The root mean squared surface distance: Drmssd:
(5)
-
Maximum surface distance Dmax:
(6)
All distance measures are symmetric, and all these measures are
only evaluated in a the region of interest that is specified in
2.1. Furthermore, the mask for the distal part of the ECA is also
used in all the above measures. The measures above will lead to
one performance value of a participants for each dataset and for
each performance measure. Per dataset and per performnce measure
a ranking of the participants will be made, i.e. with N datasets,
and the 4 measures, N * 4 rankings will be obtained. The final
ranking for a participant is obtained by averaging the ranks of
all these N * 4 rankings.
Stenosis grading
The evaluation of the stenosis grade is straightforward: the
absolute difference between the reference standard value and the
value determined by a participant is the error in stenosis grade.
As revealing the (exact) error per dataset also more or less
reveals the reference stenosis grades, the stenosis errors are not
communicated per dataset, but only per ensemble (testing or
on-site). The same holds for the ranking. The final ranking,
however, is determined by averaging the (hidden) errors per
dataset and stenosis grade (diameter and area).
Data format
Information on the data format and submission format can be found in
this document.
Submissions
The website can be used to upload processed data. Use the submit button on the
Download/Submit page to upload processed data. Uploaded data should be in the format
as described in this document:
one subdirectory per challenge, named
according to the input cta data, and the appropriate data files (roi and partial
volume for lumen, and area- and diameter stenosis for stenosis grading).
Next to adhering to the directory structure and file naming conventions, also
note the following:
- only submit a complete set of data (complete training, complete testing, or
both)
- specify which data you submitted (lumen and/or stenosis),
- specify whether you join the automated or the semi-automated competition,
- for your convenience, you can also attach a name to your submission,
- for uploaded archives, we accept rar, zip and tgz formats.
Checking your submission
The processing of the submitted data contains some very basic checks on the input
data, such as checking whether all files are present and whether your segmentation overlaps
with the region of interest of the reference standard. This check is for all possible input
datasets, both training and testing.
After processing (which may take a while), your submissions can be viewed by clicking
on the submissions button on the Download/Submit page.
For each submission, the number of errors is listed. As we try to process all possible
datasets, you will get errors on the training data if you only submitted test data, and
vice versa. These errors should be ignored, you should only check whether the data you
submitted was successfully processed.
If an error has been detected, you can either upload a new set of data,
or mark the dataset as failed. If you upload a new set of data, make it a complete set
again, as their are no ways to combine submission results.
Marking a dataset as failed will always rank that dataset as worst, but it will not
take the performance measures into account when averaging the
performance measures over all datasets. If you do not mark your dataset as failed, it
will get default values for the performance measures, which will be much worse then your
average performance. Note that, as the final ordering is on the ranks, marking as failed
does not affect the ranking of your method.
For the training data (if submitted), you can inspect the performance measures
immediately after processing is finished.
They should be similar to the values that result from applying the evaluation software
provided (contact the organizers if you detect large differences!).
For the testing data, no performance measures are shown, you can only see the number of
errors detected.
Confirming your submission
If you are confident that the testing data submitted is fine, you can confirm
your submission by sending an e-mail to
cls2009.bigr.nl,
with subject "cls2009 submission confirmation", and in the body clearly state your team name
and which submission you want to confirm, by providing the data/time and/or label of the submission,
and whether you confirm the lumen segmentation or the stenosis grading or both.
Note that you can confirm only once, after confirmation you can NOT
upload a new set of data and get it confirmed.
Viewing your results
After our confirmation by e-mail of your confirmation, you can view your results.
We also submitted the results of the three observers (ObserverA is the best observer,
ObserverC is the worst observer), and show these results together with your results.
This shows how you perform w.r.t. our manual observers.
For the lumen segmentation, both average values and
a complete list of performance measures for each dataset are provided. For the stenosis
grading, only the aggregate values are given.
The tables can be sorted by clicking on the header of a column. If you want to sort
the table with detailed lumen scores, first click on the header for sorting some measure,
and then click on the header of the column with dataset ids, to regroup the rows on dataset.
Including in paper
You should copy the contents of the table with average
performance values (including the three observers) to your paper. If you address both
lumen segmentation and stenosis grading, copy both relevant tables. Sort the table on
the average rank (last column), from high rank to low rank.
|