Tutorials

 

PLEASE NOTE: It is important to book a seat by filling in the form available at Tutorials - BRACIS 2017 .

The Tutorial form will be active until September 27th (Wednesday). After this day you will have to check the availability of each room at the on-site reception.

 

BRACIS'2017 TUTORIALS

 

TUTORIAL 1

Understanding classifier performance: ROC analysis and beyond

Peter Flach

Monday, October 2, 2017, 8:30am – 12:00am

Machine learning, broadly defined as data-driven technology to enhance human decision making, is already in widespread use and will soon be ubiquitous and indispensable in all areas of human endeavour. Data is collected routinely in all areas of significant societal relevance including law, policy, national security, education and healthcare, and machine learning informs decision making by detecting patterns in the data. Achieving transparency, robustness and trustworthiness of these machine learning applications is hence of paramount importance, and evaluation procedures and metrics play a key role in this. In this tutorial I will review important issues in theory and practice of evaluating predictive machine learning models. Many issues arise from a limited appreciation of the importance of the scale on which metrics are expressed. For example, it is OK to use the arithmetic average for aggregating accuracies achieved over different test sets but not for aggregating F-scores. These and related issues will be discussed within the context of ROC analysis, which provides a general framework for analysing per-class performance of classifiers.

 

TUTORIAL 2

AUTO-ML: automatizando aprendizado de máquina

André Carlos Ponce de Leon Ferreira de Carvalho

Monday, October 2, 2017, 8:30am – 12:00am

Nas últimas décadas, um grande número de algoritmos de aprendizado de máquina, e técnicas de pré-processamento de dados têm sido propostos. Ao mesmo tempo em que esse grande número aumenta o leque de escolhas, ele traz o desafio de qual algoritmo e técnica escolher para um novo conjunto de dados. Além disso, cada algoritmo e técnica possui parâmetros que precisam ser ajustados. Como resultado, a qualidade do desempenho obtido por um modelo induzido por um algoritmo de aprendizado de máquina é fortemente influenciado pelo conhecimento e experiência do desenvolvedor. AutoML permite automatizar várias etapas do uso de técnicas de aprendizado de máquina em novos conjuntos de dados. Permite não apenas recomendar que alternativas para recomendação de algoritmos, técnicas e valores de parâmetros, mas também técnicas de pós-processamento e de análise dos resultados. Este tutorial vai apresentar os principais conceitos utilizados em AutoML e apresentar alguns exemplos de como pode ser utilizado.

 

TUTORIAL 3

Hands-on Tutorial on MOA

Albert Bifet

Monday, October 2, 2017, 01:30pm – 05:00pm

Fast Big Data is being produced at high-velocity in real-time. To effectively deal with this type of streaming data produced in real time, we need to be able to adapt to changes on the distribution of the data being produced, and we need to do it using the minimum amount of time and memory. The Internet of Things (IoT) is a good example and motivation of this type of streaming data produced in real time. Massive Online Analytics (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling up the implementation of state of the art algorithms to real world dataset sizes. MOA includes classification and clustering methods. It contains collection of offline and online methods as well as tools for evaluation. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis, and is released under the GNU GPL license. MOA is the most popular software in data stream mining with more than 10,000 downloads/year and a community of more than 50 developers (http://moa.cms.waikato.ac.nz/)

 

TUTORIAL 4

Automatic Language Identification in Microblogs

Clarissa Castellã Xavier

Monday, October 2, 2017, 01:30pm – 05:00pm

Noisy text are found in informal settings such as microblogs. Such text may contain spelling errors, abbreviations, non-standard terminology, missing punctuation, misleading case information, as well as false starts, repetitions, and special characters (Contractor et al., 2010). The Tweet “Much lov 4 u tmorrow” is noisy because some of the words are contracted, misspelled or replaced by symbols. According Lui, Lau and Baldwin (Lui et al., 2014) “language identification is the task of automatically detecting the language(s) present in a document based on the content of the document”. Language identification task has been considered solved for long documents written in a single language, among others conditions (McNamee, 2005). However, according (Zubiaga et al., 2014) “the emergence of social media and the chatspeak employed by its users has brought about new previously unseen issues that need to be studied in order to deal with these kinds of texts”. We will focus in fully automatic machine learning approaches to identify languages from short, noisy texts as Twitter posts, based in TF-IDF and deep learning techniques.

 

SHORT BIO

 

Albert Bifet is Associate Professor at Telecom ParisTech and Honorary Research Associate at the WEKA Machine Learning Group at University of Waikato. Previously he worked at Huawei Noah's Ark Lab in Hong Kong, Yahoo Labs in Barcelona, University of Waikato and UPC BarcelonaTech. He is the author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. He is one of the leaders of MOA and Apache SAMOA software environments for implementing algorithms and running experiments for online learning from evolving data streams. He was serving as Co-Chair of the Industrial track of IEEE MDM 2016, ECML PKDD 2015, and as Co-Chair of BigMine (2015, 2014, 2013, 2012), and ACM SAC Data Streams Track (2017, 2016, 2015, 2014, 2013, 2012).

 

André Carlos Ponce de Leon Ferreira de Carvalho é professor titular do Instituto de Ciências Matemáticas e de Computação, Departamento de Ciências de Computação, da Universidade de São Paulo (USP), campus São Carlos, onde também é diretor do centro de Aprendizado de Máquina em Análise de Dados, e bolsista de Produtividade em Pesquisa 1A do CNPq. Possui graduação (1987) e mestrado em Ciências da Computação (1990) pela Universidade Federal de Pernambuco, e doutorado em Electronic Engineering pela University of Kent at Canterbury (1994). Tem mais de 300 publicações em Ciência de Dados, Aprendizado de Máquina e Mineração de Dados, incluindo 10 best papers, em congressos organizados pela ACM, IEEE e SBC. Já orientou mais de 25 teses de doutorado em diferentes universidades do Brasil e de Portugal e supervisionou cerca de 15 pós-doutorados. Faz parte do Comitê Editorial e do Comitê de Programa dos principais periódicos e congressos da área de Inteligência Artificial, Ciência de Dados, Mineração de Dados e Aprendizado de Máquina, tais como AAAI, KDD, ECML/PKDD, IJCAI e SDM. É revisor ad hoc de várias fundações nacionais e internacionais de apoio à pesquisa. É vice-diretor do Centro de Ciências Matemáticas Aplicadas a Industria. Seus principais interesses de pesquisa são Aprendizado de Máquina (Machine Learning), Mineração de Dados (Data Mining) e Ciência de Dados (Data Science), atuando principalmente nos seguintes temas: detecção de novidades, meta-aprendizado, pré-processamento de dados e metaheurísticas, com aplicações em Bioinformática, Engenharia, Finanças, Medicina e Meio Ambiente.

 

Clarissa Castellã Xavier currently leads NLP research and development at Meedan, working mainly on social media and networks, language detection, collaborative translation, fact checking, semantic analysis and information extraction. Clarissa started researching in NLP in 1999 at PUCRS NLP Group and got her PhD. In 2014 in the same institution. Since then, she is working for worldwide multi-cultural companies developing language processing tools. www.clarissacx.com; www.meedan.com .

 

Peter Flach has been Professor of Artificial Intelligence at the University of Bristol since 2003. An internationally leading researcher in the areas of mining highly structured data and the evaluation and improvement of machine learning models using ROC analysis, he has also published on the logic and philosophy of machine learning, and on the combination of logic and probability. He is author of Simply Logical: Intelligent Reasoning by Example (John Wiley, 1994) and Machine Learning: the Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, 2012). Professor Flach is the Editor-in-Chief of the Machine Learning journal, one of the two top journals in the field that has been published for over 25 years by Kluwer and now Springer. He was Programme Co-Chair of the 1999 International Conference on Inductive Logic Programming, the 2001 European Conference on Machine Learning, the 2009 ACM Conference on Knowledge Discovery and Data Mining, and the 2012 European Conference on Machine Learning and Knowledge Discovery in Databases in Bristol.