Determining a Short List of Amazon Reviews using Statistical Text Analytics and Optimization
Customers want to make informed purchasing decisions without a lot of effort. Online reviews can be helpful, but which reviews are worth reading? What we could say “Read these 3 reviews, they essentially contain the same information as all 4000 reviews.”? Dr. Douglas Kline will present a method for selecting subsets of reviews that are similar to the set of all reviews, using document-term frequency analysis and integer programming optimization. The technique is demonstrated on three products with 2000+ reviews from Amazon: a book, an article of clothing, and an electronic device. Dr. Kline will present the basics of statistical text analysis, and some of the pros and cons of typical pre-processing methods such as stop-word removal and stemming when applied across corpus. The project is not completed or published yet, but preliminary analysis shows how language differs across the product reviews, how pre-processing must be done carefully, and the dimensionality challenge of evaluating subsets from a large set.
Douglas Kline is a Professor of Information Systems at UNC Wilmington. He consults in the area of SQL Server internals and performance tuning, and also writes and authors videos as Database by Doug.
Doors open at 6:30. Talk begins at 7:00.