Real-Time Document Image Retrieval with LLAH

Last update

This page: July 12, 2007
Software: January 26, 2007 (ver.1.1)
Video : January 26, 2007
Software: May 23,2022

What's this?

This page explains a new method of real-time document image retrieval which takes as input images captured by a web camera and retrieves their corresponding pages from a large-scale document image database (DB). The core of the method is LLAH ― the algorithm called "Locally Likely Arrangement Hashing" invented in our research group.
A short video of introducing the system is available from here.

What is the task?

The method views document images as a collection of feature points. Thus the task of retrieval is to find the page that has similar arrangement of feature points. Take a look at the images below. The query image is converted to a set of feature points and then matched to feature points from pages in the DB.
Although the task is not so easy for human, machine can easily achieve it with the help of LLAH. Can you find the correct answer?


query image	feature points

query

pages in the DB

The method with LLAH tries to find a point of a page that corresponds to each point extracted from the query image. The number of points in a query image is about 400, and the number in a page of the DB is about 600. So the number of times of point matching is 2,400,000,000 (= 400 X 600 X 10,000) for a DB including 10,000 pages ― too many to be real-time for a brute-force matching. Note also that

the query image is captured under perspective distortion and a different lighting condition,
not the whole page but only its part may be captured.

These make matching much harder.

What have been achieved?

The method is characterized by:

Fast : needs only 2/7 [sec./query] for the retrieval from the DB of 10,000 pages. (1/7 [sec./query] with a simple pipeline technology)
Robust : it works under perspective distortion, uneven lightling, and non-linear deformation of page surfaces. (see pictures below)
Accurate: the recognition rate (correct retrieval rate) is more than 93%.
Low resolution: it works even with a cheap web camera (1.3 M pixels; $100).

Examples of query images correctly recognized by the method are listed below. Original images employed for the retrieval can be obtained by clicking these images.
See videos as well.


perspective distortion	partial capture	occlusion	non-linear deformation

The system allows us not only to find a corresponding page from the DB, but also, for example, to display information on the retrieved page as shown in the augmented reality video. The following figure illustrates this functionality. With the system, pages can be regarded as media to display various information, which may be diagrams, text, still images as well as movies (like a news paper in the movie of Harry Potter). You can also establish a link from a real page to the Internet.

(A larger image is obtained by clicking the above.)

Can I try ?

Yes! If you have a web camera (1.3 M pixels camera is preferable) and a windows computer (either a Dual CPU machine or two computers are preferable), you can use the system available from the following.

Software download

version: 1.1
size 5.4MB
requirements
- Windows computer (dual CPU is preferable; or two computers), Memory (more than 2GB)
- Web camera (1.3 M pixels camera is preferable)

Functions provided by this software are:

to test the system by using bundled document images,
to construct your own document database with your PDF files.

In the current distribution, we do not support the function of augmented reality.
The resolution of query and DB images is limited to be 1.3 M pixels or less. Images with higher resolution are reduced automatically in the software.
If you are interested in the software without the limitation, please send us an email to the following address.

Please note that this software is provided ONLY for research purposes. You CANNOT install it to commercial products.
(patent pending; PCT/JP2006/302669, WO2006/092957)
Source code by Tomohiro Nakai(added on May 23,2022)
Source code by Kazutaka Takeda(added on May 23,2022)

For further info.

Who invented this technology?

	Tomohiro Nakai Ph.D. Candidate	Intelligent Media Processing Lab.
	Prof. Koichi Kise
	Dr. Masakazu Iwamura