Introductory Data Science Instruction: Misconceptions

Faculty Sponsor

Karl Schmitt


Arts and Sciences


Data Science

ORCID Identifier(s)

Cody Packer, 0000-0001-9385-6868; Terry Wade, 0000-0002-3726-0709; Nathan Randle, 0000-0003-3578-2025

Presentation Type

Poster Presentation

Symposium Date

Summer 7-29-2019


Data science is an emerging field, blossoming from the large amounts of data that can now be collected due to our ever increasing use of computers. Many institutions have created data science courses to begin training students, however due to the ever changing field there are two key issues that arise: there is likely a disconnect between instructors and working professionals, and there are misconceptions and previous knowledge that students of data science have from other courses. This project aims to identify disconnects between data science curricula core elements, what early career data science practitioners use in their daily work, and identify topics in courses that establish early data science knowledge. College catalogs are parsed for courses and their descriptions. Each description is reduced to its key words and whether or not they are relevant to data science. Several machine learning algorithms, such as random forests and decision trees will be used on this data to discover courses outside the data science curriculum that contain data science ideas. We also worked toward surveying practitioners about standard data science concepts. This survey took an established body of knowledge and evaluated the strength of support from the data science community of that topic’s validity in data science as well as how that topic is used by working professionals. So far, there has been found a handful of topics in the body of knowledge labeled as not data science as well as favorable results from the survey to show topics that are and aren’t good for an intro level data science course. The codebase in development looks promising, quickly approaching the point where a full suite of machine learning algorithms can be implemented and tested for accuracy and precision. A fully implemented machine learning classification algorithm will be implemented. The algorithm will be able to outline topics that are used within data science courses. This allows future work to further analyse the output of topics to later design courses.

Biographical Information about Author(s)

Cody Packer is a junior computer science major involved with data science research over network graphs as well as experience involving back end web development.

Terry Wade is a junior computer science and data science double major with a mathematics minor.

This document is currently not available here.