Creating features for machine learning from text
Join us March 10th at 5PM Eastern Time!
Julia Silge is a software engineer at RStudio PBC where she works on open
source modeling tools. She holds a PhD in astrophysics and has worked as a data
scientist in tech and the nonprofit sector, as well as a technical advisory
committee member for the US Bureau of Labor Statistics. She is an author, an
international keynote speaker, and a real-world practitioner focusing on data
analysis and machine learning. Julia loves text analysis, making beautiful
charts, and communicating about technical topics with diverse audiences.
Natural language that we as speakers and writers use must be dramatically
transformed to new representations for analysis, whether we are just starting
off with exploratory data analysis or are ready to train machine learning
algorithms such as predictive models. We can explore typical text preprocessing
steps from the ground up, from tokenization to building word embeddings, and
consider the effects of these steps. When are these preprocessing steps
helpful, and when are they not? In this talk, learn about the process of text
preprocessing for ML models in the real world, how and when practitioners use
different preprocessing choices, and considerations for text ML tooling.