Making screenshots of popular websites

urls = ( 'www.google.com', 'www.youtube.com', 'www.facebook.com', 'www.baidu.com', 'www.wikipedia.org', 'www.yahoo.com', 'www.reddit.com', 'www.qq.com', 'www.taobao.com', 'www.twitter.com', 'www.amazon.com', 'www.sohu.com', 'www.live.com', 'www.tmall.com', 'www.vk.com', 'www.instagram.com' ) from os import mkdir from subprocess import call from PIL import Image screenshot_size = (1024, 768) default_screenshot_file_name = 'screenshot.png' store_dir = 'screenshots/' try: mkdir(store_dir) except FileExistsError: pass for url in urls: call('google-chrome --headless --disable-gpu --screenshot --window-size=%d,%d http://%s' % (screenshot_size[0], screenshot_size[1], url), shell=True) # Make copy of each newly generated image to preserve it im = Image.open(default_screenshot_file_name) im.save(store_dir + url.split('.')[1].capitalize() + '.png') im.close()

Note that this isn't a bulletproof approach. Sometimes the screenshot generation could fail with a message mentioning "CERT_PKIXVerifyCert", followed by a website name and an error code. When this happens, the script won't continue execution, so you might need to set a check (by using print(url)) to see which sites are problematic, so you can exclude them from the routine (This website also produced such a message in my test). When the image was generated successfully, you should see something similar to:

[0505/115257.664234:INFO:headless_shell.cc(377)] Screenshot written to file screenshot.png.

Depending on the platform you are using, you might need to slightly adjust the command inside the call function. Another inconvenience is that the script will open the local version of the website where one is available, which means that you might not see the language of your choice. Finding a way to adjust the locale at a site level may become tedious. Overuse of this technique may cause inconvenience on the other side, so we need to ensure that we are behaving in a respectful way, not intending to do any harm.

Overall, this is a convenient way to generate multiple screenshots without having to visit each website individually. It is also less resource-intensive.

You can use the screenshots not only in your presentations, but also for computer vision and machine learning tasks. To examine which areas are most likely to have content, the average image size, the most frequent colors and more.