Filtering COMPS Chat Transcripts for Computer Modeling Using Common Vocabulary

Faculty Sponsor

Michael Glass


Arts and Sciences


Computing and Information Sciences

ORCID Identifier(s)


Presentation Type

Poster Presentation

Symposium Date

Summer 7-31-2017


The Computer Mediated Problem Solving (COMPS) project aims to create a web-delivered problem-solving environment for student collaborative learning. One feature will be real-time computer monitoring of chat dialogs, informing instructors of the status and productivity of student discussions. This work addresses challenges in preparing typed-chat of a variety of student exercises for computer analysis. Computer identification of productive chat will utilize a vocabulary of common English words not related to specific student exercises. Here we report on software and data resources for maintaining lexicons harvested from chat transcripts. This software aids in discovering vocabulary common to diverse chatting milieus, making the vocabularies available for research and for machine processing of the text. Typed chat also presents lexical challenges. Among them are words stretched looooonger or *starred* for emphasis, along with rampant spelling errors and abbreviations. Algorithms developed for these issues are presented here.

Biographical Information about Author(s)

Nathaniel Bouman is a rising Senior Computer Science and Physics double major. He first became involved with the COMPS project during the fall 2016 semester, and was happy to work again with the group this summer. His interested stemmed from the opportunity to do research and write algorithms for a project ultimately aimed at aiding students as they learn. He is still unsure of his plans after graduation, but hopes to either continue his education in graduate school, or work in the software development field.

This document is currently not available here.