Let’s say that you wish to compile a list of the Oscar winners for best picture, along with their director, starring actors, release date, and run time. Using Google, you can see there are several sites that will list these movies by name, and maybe some additional information, but generally you’ll have to follow through with links to capture all the information you want. Scrapy is a popular open-source Python framework for writing scalable web scrapers. In this tutorial, Daniel Ni will take you step by step through using Scrapy to gather a list of Oscar-winning movies from Wikipedia.
Read more…