http://mlearn.ics.uci.edu/databases/mammographic-masses/ 1. Title: Mammographic Mass Data 2. Sources: (a) Original owners of database: Prof. Dr. Rüdiger Schulz-Wendtland Institute of Radiology, Gynaecological Radiology, University Erlangen-Nuremberg Universitätsstraße 21-23 91054 Erlangen, Germany (b) Donor of database: Matthias Elter Fraunhofer Institute for Integrated Circuits (IIS) Image Processing and Medical Engineering Department (BMT) Am Wolfsmantel 33 91058 Erlangen, Germany matthias.elter@iis.fraunhofer.de (49) 9131-7767327 (c) Date received: October 2007 3. Past Usage: M. Elter, R. Schulz-Wendtland and T. Wittenberg (2007) The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Medical Physics 34(11), pp. 4164-4172 Abstract: Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70% unnecessary biopsies with benign outcomes. To reduce the high number of unnecessary breast biopsies, several computer- aided diagnosis (CAD) systems have been proposed in the last several years. These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram or to perform a short term follow-up examination instead. We present two novel CAD approaches that both emphasize an intelligible decision process to predict breast biopsy outcomes from BI-RADSTM findings. An intelligible reasoning process is an important requirement for the acceptance of CAD systems by physicians. The first approach induces a global model based on decison-tree learning. The second approach is based on case-based reasoning and applies an entropic similarity measure. We have evaluated the performance of both CAD approaches on two large publicly available mammography reference databases using receiver operating characteristic (ROC) analysis, bootstrap sampling, and the ANOVA statistical significance test. Both approaches outperform the diagnosis decisions of the physicians. Hence, both systems have the potential to reduce the number of unnecessary breast biopsies in clinical practice. A comparison of the performance of the proposed decision tree and CBR approaches with a state of the art approach based on artificial neural networks (ANN) shows that the CBR approach performs slightly better than the ANN approach, which in turn results in slightly better performance than the decision-tree approach. The diff- erences are statistically significant (p value <0.001). On 2100 masses extracted from the DDSM database, the CRB approach for example resulted in an area under the ROC curve of A(z)=0.89±0.01, the decision-tree approach in A(z)=0.87±0.01, and the ANN approach in A(z)=0.88±0.01. ©2007 American Association of Physicists in Medicine 4. Relevant Information: Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70% unnecessary biopsies with benign outcomes. To reduce the high number of unnecessary breast biopsies, several computer-aided diagnosis (CAD) systems have been proposed in the last years.These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram or to perform a short term follow-up examination instead. This data set can be used to predict the severity (benign or malignant) of a mammographic mass lesion from BI-RADS attributes and the patient's age. It contains a BI-RADS assessment, the patient's age and three BI-RADS attributes together with the ground truth (the severity field) for 516 benign and 445 malignant masses that have been identified on full field digital mammograms collected at the Institute of Radiology of the University Erlangen-Nuremberg between 2003 and 2006. Each instance has an associated BI-RADS assessment ranging from 1 (definitely benign) to 5 (highly suggestive of malignancy) assigned in a double-review process by physicians. Assuming that all cases with BI-RADS assessments greater or equal a given value (varying from 1 to 5), are malignant and the other cases benign, sensitivities and associated specificities can be calculated. These can be an indication of how well a CAD system performs compared to the radiologists. 5. Number of Instances: 961 6. Number of Attributes: 6 (1 goal field, 1 non-predictive, 4 predictive attributes) 7. Attribute Information: 1. BI-RADS assessment: 1 (definitely benign) to 5 (highly suggestive of malignancy) (ordinal) 2. Age: patient's age in years (integer) 3. Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal) 4. Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal) 5. Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal) 6. Severity: benign=0 or malignant=1 (binominal) 8. Missing Attribute Values: Yes - BI-RADS assessment: 2 - Age: 5 - Shape: 31 - Margin: 48 - Density: 76 - Severity: 0 9. Class Distribution: benign: 516; malignant: 445