A simple way to collect your deep learning image dataset

Catch Zeng
Analytics Vidhya
Published in
2 min readFeb 24, 2021

--

Deep Learning has become the go-to method for solving many challenging problems. As we know, with enough training, a deep network can segment and identify the “key points” in the image.

If a very simple mechanism is large enough, it will have a magical effect.

Therefore, this well-functioning deep learning requires a lot of data. The more training data, the better the accuracy of the model.

But where do we get all this data from? Well-annotated data can be both expensive and time-consuming to acquire. Hiring people to manually collect images and label them is not efficient at all. And, in the deep learning era, data is very well arguably your most valuable resource.

Here, I show a simple way to collect your deep learning image dataset.

The bing-images is a Python library to fetch image URLs and download using multithreading from Bing.com. It has the following features

  • Support file type filters.
  • Support Bing.com filterui filters.
  • Download using multithreading and custom thread pool size.
  • Support purely obtaining the image URLs.

Demo

Create a demo project, called image-collector here.

Install bing-images

Requirements

Fetch image URLs

fetch_image_urls.py

Run

Download using multithreading

download.py

Run

Download square black-white images

download-square.py

The detailed code is at https://github.com/CatchZeng/bing_images. See you!

--

--