There's nothing worse than having to pore over a pile of PDFs containing documents scanned as images when you quickly have to find a specific file. Dropbox is making it easier to do that by introducing automatic image recognition, which extracts texts from photos and PDFs and makes them searchable. According to the cloud storage provider, there are 20 billion image and PDF files stored on Dropbox. Around 10 to 20 percent of those are photos of documents, so the new feature can be very, very useful.

To look for a specific photo or PDF, you simply have to type in a keyword or phrase like you would on a search engine. Dropbox will then show you the files that contain those words or phrases. The company told VentureBeat that this is "the most computationally intensive project its machine learning team has ever undertaken." They were particularly challenged by PDFs, since multi-page documents require a lot more processing power than an image file. In order to make indexing them feasible, they designed the system to stop extracting and indexing text after 10 pages.

Automatic image text recognition works for English-language JPEG, static GIF, PNG, TIFF and PDF files on Dropbox, even those uploaded before the service rolled out the feature. However, it's availability is fairly limited. Dropbox Business Advanced and Enterprise users might be able to access it soon, depending on when their account administrators switch it on, while Dropbox Professional subscribers will get the feature in the coming months. Ordinary users will have to keep on looking for documents the old-fashioned way.

Source: Dropbox