Creating features for machine learning from text
Join us March 10th at 5PM Eastern Time!
Julia Silge is a software engineer at RStudio PBC where she works on open
source modeling tools. She holds a PhD in astrophysics and has worked as a data
scientist in tech and the nonprofit sector, as well as a technical advisory
committee member for the US Bureau of Labor Statistics. She is an author, an
international keynote speaker, and a real-world practitioner focusing on data
analysis and machine learning. Julia loves text analysis, making beautiful
charts, and communicating about technical topics with diverse audiences.
Natural language that we as speakers and writers use must be dramatically transformed to new representations for analysis, whether we are just starting off with exploratory data analysis or are ready to train machine learning algorithms such as predictive models. We can explore typical text preprocessing steps from the ground up, from tokenization to building word embeddings, and consider the effects of these steps. When are these preprocessing steps helpful, and when are they not? In this talk, learn about the process of text preprocessing for ML models in the real world, how and when practitioners use different preprocessing choices, and considerations for text ML tooling.