Text Analysis Protocol
Workshop Material
Statistical Foundation
Data Mining
Data Wrangling
Kansas State University
Material Description
The purpose of this document is to show how to perform automatic classification and analysis of text files. Automatic classification of text files is done by computers “reading” the files automatically. In its most basic form, classification of text files using machine learning is performed by first converting each text files to a set of numerical values that describes it. Then the computer program identifies repetitive patterns in these numbers and uses these patterns to automatically classify or annotate these text files.
Module Materials
You can access the material here
This Material is under the CC BY license