Return to Digital Photography Articles

Catalogs and multiple versions of same photo

As of this writing, one of the most neglected functions in photo cataloging software today is the ability to handle multiple versions of the same image. In fact, this is an issue that can become a serious frustration if ignored for too long. Fortunately, it is widely expected that this functionality will be incorporated into the next releases of most decent photo catalog software applications.

So, what is the problem?

When one uses a keyword tagging system (which is the heart of most photo catalog programs), a basic premise is that each photo in the catalog can be assigned its own unique set of tags. Furthermore, most catalog programs also treat each photo on the hard drive as a seperate entity in the catalog program. When one edits an image, a fundamental rule of digital photography is to preserve the original and create a copy of it for editing.

In the days of film, you wouldn't draw on the negative, would you? Okay, so sometimes we did, but this was generally a last resort. One mistake and the original is gone. The same applies for digital photos. As it is so easy to modify a digital photo, it follows that it is also exceptionally easy to make a mistake in editing that ruins the "digital original" (if one didn't duplicate the original first).

Given that we always duplicate the original before editing, we end up with two or more copies of the same basic photo. As these edits are generally done outside of the cataloging program (except in applications like Photoshop Album 2.0 or Photoshop Elements 3), the catalog program has no way of telling that the newly-generated file in a folder is actually based on an image that is already in the database. Result? The catalog program assumes that the edited version is a completely new photo, and keywords assigned to the original will not be assigned to the edited version.

Comparison of catalog software & versioning

See the comparison chart between catalog programs. It compares a number of important features, and includes a section on how the versioning approach is handled in each application that supports it.

Summary of the multiple version problem

As the keywords that were assigned to the original photo are not copied automatically to the edited version(s), one ends up with an improperly-tagged database. Some photos will have all tags assigned, while others will be missing them. With large image databases, an unassigned / untagged photo is as useless as being non-existent. Without the keys to retrieve the photo, the only option left is to manually wade through the file hierarchy and hope that you spot the photo.

Even if you manually copy the tags from the original to the newly-imported edited version, you run the serious risk of having an out-of-date set of tags. Someday you might add or remove a tag from the edited version, but forget to apply the same change to the original. Similarly, one might modify the original version's tags but not the edited version. In both cases, the tagging will be out-of-date on one of the files.

When it comes to version control, there are three problems to solve:

  1. Identifying multiple versions and associating with original.
  2. Keeping tags current between versions.
  3. Displaying multiple versions in the browser.

Problem 1: Identifying multiple versions

The first problem with versioning is: how does the catalog program know that multiple images on the drive are related? If the software is unaware that several files on the drive are associated with each other, then there is also no way that tags can be maintained between these multiple versions (problem #2 below).

Some catalog programs today attempt to solve this issue, and various strategies are in use. The following is a list of the usual approaches performed by the catalog program, with the best methods at the top:

  • Invokes the creation of new versions, performing the duplicate of the original, if necessary.
    The catalog program is aware of the creation of additional versions because it is used to invoke the editor itself. It may also be responsible for duplicating the original photo first, and then sending a copy to an external editor. The catalog program can then hide the unedited version and replace it with the edited copy, for example. Other tools can simply invoke a script that will accomplish the same thing (ie. select a thumbnail, select the script command, and the duplicate is automatically created in the filesystem, along with another entity in the database). One potential limitation of this methodology is that it's very easy to circumvent this process by copying & pasting into new documents within external editors. Such a scenario would defeat the automated tracking provided by the invokation methodology. Fortunately, most catalog programs will also provide one of the other methods described below as a backup, so that these files will not remain catalog orphans.
  • Monitors the file system for changes in folders and imports these into database.
    The user enables a mode in the catalog program that will start monitoring the file system for changes. If it detects that a file (under control by the catalog) has changed, it will automatically re-read the image and update the metadata along with the thumbnail. If a new file is detected in a controlled folder, then that image is added to the database. See the related feature under Problem #2: Matching of New Files.
  • Performs the actual edits and therefore create the additional file versions.
    If the catalog program itself performs the edits, then one can expect that it will maintain some degree of association to the original (either by name or link within the database). The reason I feel that this is not necessarily a good approach is that for a catalog program to do a good job at cataloging, it shouldn't also try to be a good editor program too. The editing functionality within most catalog programs is limited at best, and it is often best to leave this to the job of an external editor, such as Photoshop. The only exception to this I see is with Adobe's Photoshop Elements 3. As Adobe produces an excellent editor, and have now effectively married it to the catalog program, they are in the unique position to offer complete integration. I don't expect any other company to offer an equivalent two-in-one package. However, looking at the issue from the common-user perspective, the basic editing functionality within these catalog programs may be sufficient to cover 90% of their needs (eg. exposure, cropping, rotating, sharpening), thereby reducing the need for a further tool step.
  • No internal support for the creation of versions or new-file detection. The most common scenario for catalog programs until this versioning issue becomes an important differentiating / selling point for products. The onus is on the user to duplicate the original source, edit the file with an external application, and then locate the new file manually within the catalog program. Of course, it is fairly easy to forget the re-import process, and be left with many files that are essentially orphaned by the database (there is no connection to them, and they are effectively lost).

Orphaned images: photos"lost" until rescan

A very big problem with relying on a catalog program to handle all of your future photo searching (versus searching through folders manually) is that if the files aren't known to the database program, they are effectively lost. As the collection grows, it becomes less likely that one will manually locate these orphaned images, and they will hide in the folders unnoticed. Fortunately, a number of catalog programs offer some way of re-scanning the controlled folders for files that may have been added or modified. Occasional re-scanning may be necessary if your catalog program falls into the last category described above.

Problem 2: Keeping tags current between multiple versions

Depending on the existing support for identifying newly created versions of an image, the catalog program will also have varying levels of support for the maintenance of the tags between these versions. As described at the top of this page, the maintenance of tags between versions is a critical feature for many users. Having some degree of support from the catalog program is essential. The following lists some of the various approaches offered by catalog programs today:

  • Native support for versions.
    The catalog program maintains an integrated link (ie. not through user scripting, etc.) between all versions of the same photo. As it is able to quickly find related images (through internal indices, for example), tagging is generally associated with a single asset (eg. a photo), and not individual variants of an asset. Adding a tag to a photo in the database should be reflected in all versions of the same photo, automatically. In such an approach, one should provide the ability for both common tags and individual tags (see the description below). Programs that fall into this category will probably use an efficient user interface mechanism to display the associations (see Image Stacks below, under Problem #3).
  • Detection of new files is automatically matched with originals.
    This is not a common approach, but if implemented, it could save one a fair amount of work by automating a step in the workflow. A catalog program should be able to recognize the addition of new files and have a means of locating the original. This could either be through naming conventions, metadata comparisons or user input. As the program now has a link between the versions, it can then keep the tags together. Unfortunately, this doesn't appear to be done by many applications currently available. Programs monitor folders for changes, but they typically aren't smart enough to locate the original. In all likelihood, if a program were to go to the extent of providing this feature, they would more likely do it properly and support image stacks natively (see above).
  • Scripts are used to transfer tags between original and edited versions.
    If the scripting environment is extensive enough, it is a fairly easy task to write a script that will keep tags between the originals and edited versions up-to-date.
  • User has to manually maintain tags between versions.
    This is by far the most common method, and is the only option if Problem #1 reveals that the software does not support the invocation / detection strategies listed there. Most catalog programs available today rely on the user to do the version support and maintenance, and unfortunately it is a tedious and error-prone task. Beware if your program fits into this category, as you might find it to be a lot more work than you could expect.

For those applications that support multiple versions natively, it is important that one have the ability to define common and individual tags. By common tags I mean that applying a tag to an asset (which includes all versions spawned from the same photo), such as People:Fred, will copy this tag to all versions of the same photo. By individual tags, each version based on the original can have its own independent tag that is not copied between all versions. An example of this might be a state tag such as Ratings:Excellent, or State:Sharpened. As an example, my Manage Versions script supports the notion of common and individual tags through defining a list of tags that are not transferred when maintaining tags between versions.

Problem 3: How to display multiple versions

Assuming that the catalog program inherently supports multiple versions, then the last item to deal with is how is the information conveyed to the user. The following lists the usual approaches:

  • Image stacks / versions
    Only a single thumbnail is displayed in the catalog, representing all versions of the same base photo. An indicator in the thumbnail view will show that multiple photos exist under the main thumbnail. Opening this will reveal all photos that have been associated with this image stack. A good use for this is keeping all of the edited versions along with the original, in which case you will probably keep the best edited version at the top of the stack.
  • Individual images, no visual association
    Unless the program offers native version support, then this is probably the most likely scenario the user will face. All individual versions are displayed in the thumbnail window. Some indication might be given so that multiple versions of the same photo can be easily identified (color coding or by sort order). Obviously, this is much less desireable than a program that supports image stacks (or equivalent) in the user interface.

Difference between revision control and multi-file versions

See the differences between the types of versioning support.

 


Reader's Comments:

Please leave your comments or suggestions below!
2005-01-24Terry Cockfield
 

Thanks for the summary. I'm just trying to get to grips with the cataloguing business and very much appreciate having information like this to highlight the issues.

Terry

2005-01-09sandy demees
 

First again my compliments about this article. It's great to see such an initiative.

Some of my comments:

  • "Identifying multiple versions -> Invokes the creation of new versions; ... It may also be responsible for duplicating the original photo first, and then sending a copy to an external editor. ...."

    I am not a fan of creating duplicates when opened in an external editor. What if I open a photo in an editor and decided not to edit it anyway? That would leave me with "garbage".... versions that are created in advance and not real versions. And that is something I try to prevent in my catalog. A technical solution would be for the application to "wait" for the editor to close, but I am not a fan of that either because then I won't be able to access my cataloging info as long as the editor is open.

    Therefore I want to be the one who decides when a duplicate is created. I create the duplicate and the program should store it as a version.
  • "Monitors the file system for changes in folders".
    This would need more detailing. File system notifications are send after the operation has completed, for instance the file is already overwritten and which means that the original file can not be recovered by the cataloging app for version storage. Also not all applications are so kind to trigger OS notification messages when they perform file operations.
  • "Performs the actual edits"
    Fully agree on the versioning principles. Although not about your side remark "The editing functionality within most catalog programs is poor at best". There are cataloging programs that offer basic editing features that are more then sufficient for the average editing. It is in cases where extensive edits are required (less then 10%?) where the external editor pops in for me.
  • "No internal support".
    A versioning tool should at least require this sort of version control.
  • "Native support"
    This is just how it should be done and how idimager has it implemented.
  • "Detection of new files is automatically matched with originals"
    I agree on your comment that this will not be done by cataloging programs in the short run. Scanning the image would require each image its content to be parsed against the content of all other images t see if there is a match. This would have to be a monitoring process that would slow the entire system down significantly. Typically megapixels go up as technology advances, so I think the far future will not bring this either. I might be wrong about this, and it would be great if some program would implement this.
Too bad your descriptions of Problem 2 are mainly focussed on how you could achieve versioning using the current version of iMatch and does not widen its vision. Again my compliments and I hope to see more of those.

 

Thanks for the great comments!

Whatever mechanism one provides to automate the versioning process is always going to have the boundary conditions that cannot be controlled. You've pointed out some situations that would be incredibly hard to properly monitor. No matter what, we still require the manual addition method, which all catalog programs allow for. Part of my intent in suggesting these alternatives is that they may catch 90% of the situations people create, where additional versions are created. When one essentially hands over the keys to their photo collection to the catalog software by renaming all folders and files to dates and sequence numbers, one is absolutely dependent upon the fact that new files need to be added to the database whenever possible. Missing the addition of a file pretty much renders it lost, as an orphaned file. Unless one actively checked through all folders again, it is easy to overlook these missing files. So, I think some level of safeguarding the duplicate-generation process is helpful. I'd need to give this some more thought, though.

As for the comments on the photo editing aspect of catalog software... I have changed my wording to reflect a more level view of the issue. Overall, the degree of functionality in these editors is generally quite simple, but it often does satisfy the needs of many consumers. I suppose a good percentage of users only care about cropping, red-eye reduction, exposure correction and perhaps sharpening. Many others would completely ignore this aspect of the application and prefer to use their favorite third-party application. My point is that I would rather see the effort go into providing a solid full-featured catalog application, than invest time in making it a swiss-army knife that will only detract from the quality and R&D advancement into developing the catalog functionality further.

The automatic detection and matching to originals is a pipe-dream. You're quite right in that the processing requirements are going to be ever-increasing and may never catch up to the state of the common desktop machine's realistic capabilities. Similarly, there is the larger issue of the fact that new versions of a file may look nothing like the original after certain edits, thereby defeating the whole concept. Therefore, I am thinking more along the lines of a comparison search of various metadata fields, which may be preserved through some of the editing workflow. At the very least, an attempt can be made to offer suggested matches to the user, starting with the current folder contents.

2005-01-08Adam S
 

GREAT summary. I've been thinking about this versioning issue for a long time but never took the time you have to put it all down. Thanks a lot for doing this.

Personally, I would be happy with a product that allowed me to create "stacks" ala Elements 3. Automatic detection of new versions would be nice, but as a feature it is further down the priority list than a simple stacking system.

I've got an iMatch license and hope Mario introduces a feature like this in a coming version. But I switched to iView's MediaPro product when it came out for the PC (being a former MediaPro user on the Mac, I was very familiar with the program). idimager is intriguing but clearly still a maturing product.

Anyway, thanks again for getting these thoughts on the web. Hopefully some developer is listening!

2005-01-05john beardsworth
 

I think what you say is a fair review of the current state of play. Maybe I would also add a "forensic" method - Photoshop CS can store editing history metadata. That's not really cataloguing, I know, but it may help in the future.

One weakness of the stacks, "invoked" and "intelligent" methods is that often one creates a copy image, a cut and paste into a new window. So, other than pixel matching, how can a catalogue ever cope with this?

So you need to turn the issue round a little to what the photographer can do to make life easy, whatever package he/she is using.

I believe in the use of a unique filename that is in every version of the image. There are different approaches, but mine happens to be YYMMDD + 6 digit sequential + short description.

The key is the 6 digit sequential because that is included in every version of every image. So if I use a keyword search to find my raw file, I then just search on the 6 digit code and can get to various edits such as b&w version, web jpeg etc. It's quite a lazy way to do it, but solid.

When you move to a different catalogue program, you'll also benefit because it's a catalogue independent approach, which I think would appeal to you.

 

John - I definitely agree with your assessment of the limitations, even with native support for stacks. Re-evaluating the issues and breaking it down into three problems, I have now rewritten the article to address the issue in greater depth.

Using the sequential number scheme is key. I certainly do this too, although I use a 4 (digital) or 6 (film) digit sequential number and don't include any description in the filenames. As an aside, my belief has been that it is hard to create a scalable / beneficial approach to having descriptions in the filenames unless one uses a controlled vocabulary. The overall numerical string is kept constant in the naming of all variants from the same image. Like you have said, you then always have the fallback to search on: linking by filenames. Not only does it allow searching to bring up all versions of the same photo, but it also enabled me to write a script that could automatically find all versions of the same file. This is the basis for my Manage Versions script.

And the last point you make is also a good one: trying to ensure that the solutions are platform-independent — using a careful file-naming scheme can help preserve multiple-version associations even after changing catalog software.

 


Leave a comment or suggestion for this page:

(Never Shown - Optional)
 

Visits!