Text Analysis Protocol

Workshop Material

Statistical Foundation
Data Mining
Data Wrangling
Kansas State University
Author

Lior Shamir

Material Description

The purpose of this document is to show how to perform automatic classification and analysis of text files. Automatic classification of text files is done by computers “reading” the files automatically. In its most basic form, classification of text files using machine learning is performed by first converting each text files to a set of numerical values that describes it. Then the computer program identifies repetitive patterns in these numbers and uses these patterns to automatically classify or annotate these text files.

Module Materials

You can access the material here

This Material is under the CC BY license

Back to top