Emeritus Professor (Computer Science), The University of Waikato
Specialization: Programming by example, text compression, machine learning, data mining, digital libraries, interactive systems.
Title: Big data, deep learning, and Weka
Abstract: When is data “big”? We examine this question with reference to the popular Weka interactive data mining system. The widely used Explorer interface is limited by the fact that datasets must fit into main memory. However, Weka also has facilities that transcend this limitation and can learn from effectively unlimited datasets – which requires machine learning methods that operate incrementally, in one pass through the data. Weka includes incremental implementations of standard classifiers. Its Knowledge Flow and command line interfaces can be used on datasets of any size. Moa, Weka’s big sister, is expressly designed to work on unlimited data streams, and includes suitable data generators and evaluation methods. Distributed Weka allows Weka to operate on multiprocessor clusters based on either the Hadoop or Spark architectures. We also survey what has been called the “deep learning renaissance”: the application of high capacity networks to overwhelmingly large quantities of data, particularly in areas of image recognition, face recognition, and language processing. High-speed GPU implementations are critical to the success of these techniques. Weka supports deep learning with a classifier that applies Deeplearning4j, an open source program library that includes distributed parallel versions – and the ability to operate on a GPU. This Weka facility is unique in that you can train a deep learning network without writing code. The aim is to defy the Oxford English Dictionary’s definition of big data as “data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges.”
Biography: Ian H. Witten is a computer scientist at the University of Waikato, New Zealand. He is a Chartered Engineer with the Institute of Electrical Engineers in London who graduated from the University of Cambridge with a BA and MA (First Class Honours) in mathematics in 1969 and an M.Sc. in mathematics and computer science from the University of Calgary, where he was a Commonwealth Scholar, in 1970. He received his Ph.D., Learning to Control in 1976 from the University of Essex, England (Electrical Engineering Science). Witten is a co-creator of the Sequitur algorithm and original creator of the WEKA software package for data mining. Witten is a Fellow of the Royal Society of New Zealand and a recipient of the Hector Memorial Medal which was awarded to him in 2005.