Return to Digital Photography Articles
How does a Photo Database work?
There are now many photo catalog programs on the market, ranging from those that target the absolute beginner to those geared towards professionals in large organizations. Understanding a bit about how these programs work can give you some insight into the important features when comparing them.
Inside your Photo Catalog Software
The following section should give you some idea as to how most photo catalog programs work. For the purposes of this discussion, all files that reside on a drive or removable disc that are indexed in the catalog program's database are called controlled. All other files will be refered to as uncontrolled.
Image Database Architecture
Nearly image catalog programs maintain an index of images on your hard drive. These indices contain a link to every controlled photo in the database along with some additional information. The following list shows the typical properties in each database entry or record:
- Directory pathname - The directory hierarchy that identifies the folder containing the image. eg. D:/Photos/2003/2003-02-14/
- Filename - The name of the file. (eg. 20041031_3255.jpg)
- Index into thumbnail database - An index that is used to locate the thumbnail image in a seperate thumbnail database, or possibly even the thumbnail itself.
- List of assigned tags - All keywords or tags that have been assigned to the image. This list can theoretically have dozens of indexes into a category database. eg. Jamaica, Friends, Party
- File properties - All of the commonly-used file system properties. eg. last modified date, read-only flag, etc.
- Other searchable properties - Includes anything else that is associated with the image which can be searched across by the user. One example is the identification of a bookmarked image. Some of these elements may be duplicates of what exists in the image file itself.
- optional: Metadata - Data that is integrated into the image file but that is not part of the image itself. eg. EXIF, IPTC, caption data
- optional: Archive properties - Any additional data that helps the archiving or backup process. As using the Windows file system's archive bit is not a safe way of determining files for incremental backups, the catalog program may have its own last backup timestamp.
- optional: Other - There are many special properties that are probably included in the database entry. These entries facilitate performance or house extra state information. One example might be the Virtual Rotation Flag. This flag allows the catalog program to rotate an image superficially (ie. without touching the original file), just for the purposes of display.
So, for an image database to operate efficiently with thousands of photos under its control, the software must preferrably have to deal with a single file (the database). If the catalog software didn't maintain such a database, it would have to exhaustively search through all files in all folders to perform any useful operations. For example, let's assume that you wanted to locate all photos that were greater than 640x480 in size. If the catalog program didn't have such a database (or that property wasn't stored in the database copy), then it would have to locate every file in every folder and read the file's header. This would require thousands of disk accesses. If, on the other hand, the database contained a copy of this information, the catalog software would only need to load the database file once, and no more file accesses would be necessary. Stepping through the database is extremely fast as it can usually be loaded into the computer's memory cache, requiring only a minimum of disk accesses.
As files are imported into the database, a thumbnail image is generated for every photo. This thumbnail is stored either in its own database or with the main catalog database. These thumbnails are generally small-scale versions of the real images (often in the 100-200 pixels-to-a-side range), and are used to allow quick preview of each entry in the database without having to load and decode each image file. Some programs allow you to change the thumbnail size (eg. Adobe Photoshop Album 2.0), while others (eg. IMatch 3.4) set the size to a fixed dimension that you configure when setting up the database.
The advantage in having a fixed size thumbnail in the database is that the display will be very efficient. No resizing of the image needs to be done as each thumbnail is displayed. Programs like Adobe Photoshop Album 2.0 appear to approach the thumbnail strategy in a different way.
Photoshop Album 2.0 saves a moderate-sized thumbnail in the database (ranging between 80x60 and 320x240) and appears to dynamically resize the image to match the user's preview choice. This continuous resizing of thumbnails (eg. from 320x200 to 200x166) takes processing power and can slow down the preview pane considerably. It also lets you change the preview size to be larger than the thumbnail size (eg. > 320x200), but it does this by showing an up-sized thumbnail, then redrawing it with a resample of the actual file after it has loaded it. This is an incredibly slow way to display thumbnails, but some users will prefer the ability to see large previews in the preview pane.
In addition, if you are planning on keeping a lot of your images "offline" (see the section on offline media), then it makes sense to use larger thumbnails. As you are limited to seeing only the thumbnails in offline images (unless you locate the individual disk containing the image first), it's important to ensure that it offers you enough detail to adequately identify and distinguish photo details. However, in some programs (like IMatch), a seperate thumbnail size can be configured specifically for off-line images. This way you can keep manageable thumbnails for the on-line work set and screen-sized previews larger (under Options > Preferences > Off-Line Cache).
The restriction that IMatch places on not changing the thumbnail size is an odd one. It should be possible for the program to allow you to select an option which would let you discard the thumbnail database and recreate it with a different thumbnail dimension. While this could take minutes or hours to do (and it would involve bringing all offline media back online), it would allow someone to change their mind later on if their needs changed, or if they simply didn't consider it carefully enough at the start. While this is not something that one would do often (at least not like PSA's dynamic thumbnail sizing), it shouldn't be a difficult feature to implement.
One of the big advantages of catalog programs is the ability to work with photos (categorization, captioning, sorting, etc.) even though they may no longer be on your hard drive. As collections grow large, many photographers end up transfering their original images to removable media (CD-R or DVD-R), and leave the thumbnail behind in the database. These images are said to be offline.
In addition to the ability to categorize and label images that are on removable disks, one is also generally able to work with multiple synchronized catalogs. The most common use for this is having all of the original photos and catalog on your main computer at home, but still being able to bring the catalog with you on a laptop. The size of a catalog may be in the order of hundreds of megabytes, while the size of the actual photo library might be tens of gigabytes.
Some catalog programs will synchronize multiple catalogs and let you make changes on more than one. Later, when one connects the two computers together, the changes in each are merged together, with any conflicts resolved by the user. As most programs don't offer this flexibility (two-way synchronization), users often mimic this functionality by duplicating the database on the second machine (one-way synchronization) and only make changes in the main computer's database. The second database (eg. on the laptop) would then only used for viewing purposes (ie. read-only).