Unsupervised classification of specialty coffees in Homogeneous sensory attributes through machine learning





Brazil is the largest exporter of coffee beans, 29% world exports, 15% this volume in specialty coffees. Thereby researches are done, so that identify different segments in the market, in order to direct the end consumer to a better quality product. New technologies are explored to meet an increasing demand for high quality coffees. Therefore, in this article has an objective to propose the use of machine learning techniques combined with projection pursuit in the construction of unsupervised classification models, in a sensory acceptance experiment, applied to four groups of trained and untrained consumers, in four classes of specialty coffees in which they were evaluated sensory characteristics: aroma, body coffee, sweetness and general note. For evaluating classifier performance, in the data with reduced dimension, all instances were used, and considering four groupings, the models were adjusted. The results obtained from the groupings formed were compared with pre-established classes to confirm the model. Success and error rates were obtained, considering the rate of false positives and false negatives, sensitivity and classification methods accuracy. It was concluded that, machine learning use in data with reduced dimensions is feasible, as it allows unsupervised classification of specialty coffees, produced at different altitudes and processes, considering the heterogeneity among consumers involved in sensory analysis, and the high homogeneity of sensory attributes among the analyzed classes, obtaining good hit rates in some classifiers.

Key words: Classification models; Data dimension reduction; Groupings identification; Projection pursuit.


ARTHUR D.; VASSILVITSKII S. k-means++: the advantages of carefull seeding. Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007.

BOAVENTURA, P. S. M. et al. Value co-creation in the specialty coffee value chain: The third-wave coffee movement. Revista de Administração de Empresas, 58(3):254-266, 2018.

COOK, D.; BUJA, A.; CABRERA, J. Projection pursuit indexes based on orthonormal function expansions. Journal of Computational and Graphical Statistics, 2(3):225-250, 1993.

COOK, D.; SWAYNE, D. F. Interactive and dynamic graphics for data analysis: With R and GGobi. New York: Springer, 2007, 202 p.

FRIEDMAN, J. H.; TUKEY, J. W. A projection pursuit algorithm for exploratory data analysis. IEEE Transaction on Computers, 23(9):881-890, 1974.

HALL, M. et al. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1):10-18. 2009.

HOCHBAUM, D. S.; SHMOYS, D. B. A best possible heuristic for the k-center problem. Mathematics of Operations Research. 10(2):180-184, 1985.

JOHNSON, R. A.; WICHERN, D. W. Applied multivariate statistical analysis, 6th ed. New Jersey: Pearson Prentice Hall, 2007, 794p.

LATTIN, J. M.; CARROLL, J. D.; GREEN, P. E. Analyzing multivariate data. Pacific Grove, CA: Thomson Brooks/Cole, 2003, 455p.

LIGGES, U.; MÄCHLER, M. Scatterplot3d - An R package for visualizing multivariate data. Journal of Statistical Software, 8(11):1-20, 2003.

LISKA, G. R. et al. Evaluation of sensory panels of consumers of specialty coffee beverages using the boosting method in discriminant analysis. Semina: Ciências Agrárias, 36(6):3671-3679, 2015.

MARTINEZ, W. L.; MARTINEZ, A. R.; SOLKA, J. Exploratory data analysis with MATLAB. 2nd ed. New York: Chapman & Hall/CRC, 2010, 499p.

MCQUITTY, L. L. Similarity analysis by reciprocal pairs for discrete and continuous data. Educational and Psychological Measurement, 26(4):825-831, 1966.

FERREIRA, H. A. et al. Selecting A Probabilistic Model Applied To The Sensory Analysis Of Specialty Coffees Performed With Consumer. IEEE Latin America Transactions, 14(3):1507-1512. 2016.

POSSE, C. Tools for two-dimensional exploratory projection pursuit. Journal of Computational and Graphical Statistics, 4(2):83-100, 1995.

OSSANI, P. C.; CIRILLO, M. A. MVar: Multivariate analysis. 2020. R package version 2.1.2. Available in: <https://cran.r-project.org/web/packages/MVar/index.html>. Access in: September, 10, 2020.

OSSANI, P. C. et al. Quality of specialty coffees: a sensory evaluation by consumers using the MFACT technique. Revista Ciência Agronômica, 48(1):92-100, 2017.

RAMOS, M. F. et al. Discrimination of the sensory quality of the Coffea arabica L. (cv. Yellow Bourbon) produced in different altitudes using decision trees obtained by the CHAID method. Journal Of The Science Of Food And Agriculture, 96(10):3543-3551, 2016.

RENCHER, A. C.; CHRISTENSEN, W. F. Methods of Multivariate Analysis. 3th. ed. New York: J. Wiley, 2012. 758p.

R DEVELOPMENT CORE TEAM. R: A language and environment for statistical computing. R foundation for statistical computing. 2020. Vienna: Vienna University of Economics and Business. Available in: <http://www.R-project.org/>. Access in: September, 10, 2020.

SPECIALITY COFFEE ASSOCIATION OF AMERICA. SCAA Protocols. Cupping Specialty Coffee. Long Beach: SCAA, 2009, 7p.

SPERS, E. E.; SAES, M. S. M.; SOUZA, M. C. M. Análise das preferências do consumidor brasileiro de café: Um estudo exploratório dos mercados de São Paulo e Belo Horizonte. RAUSP - Revista de Administração da Universidade de São Paulo, 39(1):53-61, 2004

WARD, J. H. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236-244. 1963.

WISHART, D. An algorithm for hierarchical classifications. Biometrics, 25:165-170, 1969.



How to Cite

OSSANI, P. C. .; ROSSONI, D. F.; CIRILLO, M. ÂNGELO .; BORÉM, F. M. . Unsupervised classification of specialty coffees in Homogeneous sensory attributes through machine learning. Coffee Science - ISSN 1984-3909, v. 15, p. e151780, 30 Dec. 2020.