Return to Digital Photography Articles
How Image databases get out of sync / corrupted
How Image databases monitor external changes
The following gives you a brief overview of the ways in which an image database can be corrupted or out-of-synch with the underlying images. Understanding how the image database software works with the files is important to understand what sort of operations will cause problems for your catalog. In particular, simple acts such as renaming or editing photos can lead to trouble. The following sections stress why you can't change files outside the catalog program without being aware of how the image database works.
Making changes within the catalog software
As described in the image database architecture section, the photo catalog maintains a list of directory path and filenames to every image in the database. While working within the catalog program, you have the freedom of renaming folders, moving folders and moving files between folders. Similarly, you can modify the files themselves by renaming or deleting them.
As the asset management software is the means by which you are making these modifications, it is a simple matter for the asset management program to update the list of files in the database. Through the changes listed above, it is easy to see that the filename and directory paths might change drastically.
Some asset management programs allow you to do batch renaming, whereby all selected files are renamed according to some naming scheme (see the file naming section). This is often necessary when one is importing photos into the catalog for the first time (eg. you've already copied the images to the hard-drive via some other program, but now you want them to be under the control of a new asset management program). Again, because the catalog software is involved in the process of the renaming, it can simply update all of the image records in the database to reflect the changes.
Modifying / renaming / moving photos outside of the Catalog Software
Now, let's take a look at what happens when you try to make the same type of changes outside of the image database software. For this scenario, we will assume that you have quit the image database program and are in the standard Windows explorer view.
If one were to locate the images that are under control by the database program (ie. the image files that have been added to the database catalog) and make some changes, one can expect disastrous consequences.
Let's say that you were to drag a few pictures (eg. frog1.jpg and frog2.jpg) from one folder (animals) into another folder (amphibians) and then open up your catalog software again. In all likelihood nothing would look any different. We're fine, right? NO! Upon navigating to the frog1.jpg image, one can see the thumbnail of the frog, but why does it show up in the animals directory still?
Worse yet, when we double-click on the frog1.jpg thumbnail, we'll likely get an error indicating that the file cannot be found. At this point, we have made the database out-of-date with respect to the underlying file system. The directory path to the image has changed, but the catalog program was never notified of any change.
Using any external (ie. outside your catalog program) file renaming utilities will likely cause these sort of database errors. Unfortunately, the batch renaming functionality available in most catalog programs is weak enough to warrant using an external program. One must be very careful about how they approach the renaming.
Depending on the catalog program in use, one of several strategies might be used to fix this scenario. Some programs have more intelligent recovery scenarios than others. The most common options provided to the user are:
- Error, with no option to immediately update the location. Nowadays, it is unlikely that a program will leave you hanging at this step without some means of allowing you to correct it.
- Asks you to locate each missing file. This is the most common method (used by programs like Adobe Photoshop Album 2.0) and is not particularly helpful. If you've made the unfortunate mistake of moving a lot of files, you might end up having to locate each photo manually. See the smart relocation detection section. This is almost always the case when you've changed the filename.
- Asks you to locate a missing file, tries to locate rest. If you've simply moved a number of images from one folder into another, then the program can start with the assumption that the other missing photos have also been moved to the same destination folder. This way the tool can save you a lot of trouble. For moved files from other folders, you will have to be prompted again. Doesn't handle well the case where a root folder has been changed.
- Asks you to locate a missing file, tries to locate rest, looks at changes in root of hierarchy. Better than the previous method, if the program determines that you've moved a file from one root level in the hierarchy to another, it will assume other folders have changed their roots as well.
- Finds possible matches across regions of your hard drive from the metadata stored in the database. If you've changed the filenames, or a lot of folder contents have been moved around, this is the easiest way to fix the database. The software extracts some of the characteristics of the missing/relocated image from the entry in the database. These details might include filename, filesize, CRC, EXIF date and time, etc. Then, every file in the specified folder hierarchy is examined for a match. Generally, it would make sense that it would skip comparing against files that already exist in the database (and are in the proper location).
Some relocation implementations are better than others. Unfortunately, most of them fail at handling one simple change: renaming a root folder in the hierarchy.
When a file cannot be found by the catalog program, it will ask for help in identifying the new subfolder that may contain the moved image. For any other missing files in the same old folder, it will then ideally attempt to check the recently-learnt new folder path. This may work fine for files that are in one directory, but once it begins processing the missing files in the next directory, it must again ask the user for the location of the new folder.
Changing the root folder name, or a folder higher up in the hierarchy will cause all folders below the change to fail in locating the images. A smart system would recognize a change at the top of the hierarchy, and try to predict other folder locations given that information. Let's look at an example.
|Original hierarchy of files:|
E:\ Photos\ 2003\ 2003-08-09\ 20030809_3443.jpg (file D) 2004\ 2004-01-01\ 20040101_8823.jpg (file A) 20040101_8825.jpg (file B) 2004-01-02\ 20040102_9012.jpg 20040102_9013.jpg (file C) 20040102_9014.jpg
Now lets rename one of the topmost folders from Photos to Work, outside of the catalog program's control.
|New hierarchy of files:|
E:\ Work\ 2003\ 2003-08-09\ 20030809_3443.jpg (file D) 2004\ 2004-01-01\ 20040101_8823.jpg (file A) 20040101_8825.jpg (file B) 2004-01-02\ 20040102_9012.jpg 20040102_9013.jpg (file C) 20040102_9014.jpg
While we only made one small change, all links in the database will immediately become "disconnected" from the actual images as the directory pathnames have changed.
|File||Original Path||New Path|
When the catalog program tries to find File A, it detects that the file has moved from E:\Photos\2004\2004-01-01\ to E:\Work\2004\2004-01-01\. Then, the program will move on to File B. Since it already recognizes the original path, it tries to look in the same "New Path" as for File A. It works.
Now here's where the problem starts. When it tries to locate File C, it asks the user for the new location because it doesn't recognize the "Original Path". This is where a smart relocation algorithm would handle things differently. It should first check to see if File C's path matches any of the previously-seen "Original Paths" If none are recognized, then it should consider the changes between File A's "Original Path" and "New Path". A somewhat intelligent algorithm would notice that the only change was in one of the folder names higher up in the directory hierarchy.
At this point, the program should attempt to search and replace Photos with Work on future directory mismatches. If this were done, it would no longer have to ask the user for any other missing files. Unfortunately, most programs will instead ask you to locate a file from every single folder in the system!
Updates due to editing an image
We have just seen how moving or renaming the controlled files can cause the database to get out of date. The other type of inconsistency that is easy to create is the modification of the image file with an external application. Most often, this is simply due to opening the image in an editor such as Photoshop and resaving the photo. Unless the catalog program has been actively monitoring the files (see the monitoring section below), it will have no way of telling that the thumbnail (or other image features) are now out of date. Instead, the catalog program will give no indication that anything has changed. It is up to the user to instruct the catalog program to update the thumbnail or redetect the image. Unfortunately, this extra step is often forgotten, leading to an out-of-date catalog.
Ways to monitor & counteract these modification issues
There are several ways that catalog programs attempt to monitor or track your edited files. For a complete summary of these issues and methods, please see the section on Catalogs and Multiple Versions of the Same Photo.