IMatch Script:
Manage Versions 2.3.4

www.ImpulseAdventure.com/photo/

Written by

Calvin Hass

Latest Updates

For the latest version and updates, please go to the IMatch Versioning Page

Newest Features (since 2.3)

Newest Features (since 2.2)

Contributions

The following people have helped out significantly in the development of this script: Tim Armes (extensions for ver 1.4), Stefan Ahlswede (German Translation), and all of the beta testers who have offered their time, suggestions and critical input.

Introduction

The intention of this script is to allow one to manage multiple versions of the same image within the database. Often, a photographer saves the original photo into the IMatch database and then creates one or more derived copies(which might include photoshop edits, resized versions for the web, JPEGs extraced from the RAW, etc.)

Unfortunately, the current version of IMatch does not natively support versioning, so each derived version of the same image is treated seperately. Since most users rely on the ability to search for images based on assigned categories, it is obviously important to have the categories added to each version of the image. In other words, the location, Africa might have been applied to the original photo, but this same category should also be applied to any resized versions of the same photo.

This script runs through the currently open database and attempts to identify all derived images (ie. non-originals). With these derived versions, a search is performed to locate the original file for each. A reliance on strict naming conventions allows this to be done automatically.

Installation

  1. Copy this script into your IMatch scripts directory
  2. Run the program and change the configuration settings to match your database
  3. Run once in Dry Run mode (the default), which previews what the script will do without making permanent changes
  4. Examine the report file to see any errors that are reported (see the end of the report file)
  5. Rerun the script and make any necessary changes to your configuration
  6. Run in Normal mode (Dry Run disabled) to perform the actual scripted operations

Naming Strategy

Please see my Naming Strategy page at ImpulseAdventure.com for details on naming conventions.

This script classifies every file within the database as being either a controlled file or an uncontrolled file. Controlled files are ones that follow the naming convention, and can therefore be analyzed by this script. Uncontrolled files are all of those which do not follow the convention, and are left alone by this script. This distinction is made because most users will have non-photo files in their database (eg. downloaded images from the internet, photos from friends that haven't been named and sorted, etc.). Because these files will not follow any strict naming convention (eg. paris_2004 spike-and-mike.jpg), it would not make sense to parse the filenames.

The components of a file name are: prefix, body, suffix and extension. In a file named 20010318_r8122_21-e1.jpg, the prefix is: 20010318_, the body is r812_21, the suffix is -e1 and the extension is .jpg.

All controlled files are recognized by a match to the file naming pattern configured in the script (see the Configuration Options below). For example, one can configure the script to recognize all photos that start with an eight-digit number and an underscore (eg. 20041031_1234.jpg) as a controlled file.

Operation

The user has the option of either running the script on a selection or the entire database. By default, Manage Versions operates on the entire database (when Selection Only is disabled). If Selection Only is Enabled, the user must first select a category, folder, a group of files, or the entire database (by clicking on the @All category or root database folder) before running the script.

For the following discussion, we will assume that the script is processing the images in your database and has come across one named: 20010318_r8122_21-e1.jpg.

The script tries to determine whether or not an image is a controlled file. If it is not controlled (ie. uncontrolled), then the script moves on to the next image. Manage Versions allows one to be very specific about which files are to be processed (ie. controlled), when one checks the Strict Prefix mode. Normally, Manage Versions assumes that all files within the database (or selection) should be processed (ie. controlled). Please see the Advanced Users section below for more details on how Strict Prefix is used.

For a controlled file, the script then examines the rest of the filename to see if it includes a suffix (eg. -s, -e, etc.). Suffixes are usually denoted by a hyphen (or your Derived Version Char), followed by one or more characters and/or numbers. These are used to indicate the state or derived version from the original image. States might include edit, web-version, print-version, resized, etc. The important differentiator here is the original image filename does not have a suffix, while all derived versions have a suffix. By following such a convention, it is a fairly simple matter to locate the original file from a derived file. Note that this script allows multiple "suffixes", which can be used to record multiple edit history, for example. (eg. 20010318_r8122_21-e-s.jpg). The script basically treats everything after the first Derived Version Char as the suffix.

If the script finds a suffix, it marks the file as non-original (hereafter called the derived version) and attempts to locate the original version. The original's file name is generated by taking the derived version filename and removing the suffix (ie. the Derived Version Char and everything after it). In the above example, it will be assumed that the original file is called 20010318_r8122_21.jpg. A check is done to see if the derived version has been assigned to the derived state category (defined by Category for Derived Images). If this special category wasn't assigned to the derived version, a warning message is reported and the derived state category is added to the image. This way, all derived versions should be marked with this derived state category, making it easy for one to find and distinguish later.

File Types

The script also handles the situation where the derived version might not have the same file type (ie. extension) as the original. This is often the case with RAW->JPEG or BMP->JPEG converted files.

A sequence of file types / extensions is searched, with priority given to RAW file formats, uncompressed / lossy types and then compressed formats. In other words, the script will first search for 20010318_r8122_21.crw, 20010318_r8122_21.bmp and then 20010318_r8122_21.jpg, for example. Obviously, you should make sure that your desired file types are listed in the Order of Extensions configuration (see below).

Directory Structure

In addition to the generation of the original's filename, some flexibility has been provided to allow for the derived version to appear in a different directory from the original version. Some photographers keep the derived and original versions seperate, and this can be configured through the Searching for Originals configuration options (see below). In particular, four types of directory strategies are supported:

  1. having originals in the Same Folder as Derived versions
  2. different root folder(s) but same hierarchy below (located by Search and Replace Folder operations)
  3. originals in a Subfolder of Derived folder.
  4. Anywhere in Database. In this mode, the script searches the entire database for the original file, irrespective of the directory structure of the derived file. This has the advantage in a flexibile directory structure, but a disadvantage in that it won't work well if you have duplicate filenames in your database.

Synchronizing Categories & Properties

There is a special situation that must be handled carefully when synchronizing categories and properties between original and derived images: extras. This topic is fairly complicated, so if you are curious, read about it in the Advanced Users section.

I strongly recommend that you follow the guidelines listed in the next section. By following these suggestions, you should not have to worry about the "extra" categories and properties!

Easy Workflow Suggestions

To ensure that synchronization is performed easily, please consider the following suggestions:

Please see the configuraiton options below for a definition of each Category Copying and Property Copying settings.

Dry Run / Safe Mode

As there are a number of operations that can be performed by this script, a couple important options are provided for debugging purposes. Most important of all is the ability to run the script without actually making any changes to the database (Dry Run enabled). This is the recommended way to run during the first try. This way you can identify what will happen (and fix anything if necessary), before one performs the actual operation with Dry Run mode disabled.

Configuration Options

All configuration options are stored automatically by the script and recalled the next time you run it. The first time that you run Manage Versions, the configuration settings will all be the defaults and they will not likely match the values you need for your database. Therefore, it's important that you set up the configuration to your needs. As the default is to run Manage Versions in Dry Run mode, you won't do any harm by running the script on your database with incorrect settings!

Note that for all of the list-based configuration options (eg. Order of Extensions or Categories to Ignore), each value is seperated by a comma, with no spaces between elements.

In most of the options where the user can specify a category, a button with "..." can be pressed that will cause the category selection dialog box to appear.

Once the options have been selected, the user can press the "OK" button and the script will verify a number of settings, flagging an error if one exists (e.g. bad category field) before moving on to execute.

Language

Defines the language to run the script in.

English: English
Deutsch: German

Dry Run

Defines whether or not the script actually makes any changes to your database. This provides a good way to determine what the script will do before you actually make the changes to your database.

Enabled: the script only reports the intended actions but does not actually perform any work.
Disabled: the script will perform the normal mode operations as expected.

Verbose Messages

Increased level of messaging for controlled files.

True: all controlled files are reported in the output.

Report File

Determines whether or not a dialog box is brought up to select the report file for output.

Prompt for File : the user selects a suitable file for output.

Automatic Overwrite : the file specified by File is used for output. In this mode no warnings are given if the file already exists, and the contents will simply be overwritten.

Debug Messages

Displays the categories attached to each controlled file.

True: All categories for each controlled file is printed in the output report.

Selection Only

Determines which images are processed in the script.

Enabled: Only images in the current selection are processed.

Disabled: All images in the database are processed (default).

Copying Categories

Defines the way in which categories are copied between original and derived versions. Also defines how extra categories are handled. Extra categories are defined as categories that exist in a derived file that are not attached to the original file, or as categories that exist in the original file that are not attached to the derived versions.

Disabled:
No changes are made to the image categories.

Original -> Derived :
All categories (excluding Categories to Ignore) are copied from original files to derived versions, but extra categories in the derived versions are left alone.

Original -> Derived Forced:
All categories (excluding Categories to Ignore) are copied from original files to derived versions, and extra categories in the derived versions are removed (true synchronization).

Derived -> Original:
All categories (excluding Categories to Ignore) are copied from derived versions to original versions, but extra categories in the original version are left alone.

Derived -> Original Forced:
All categories (excluding Categories to Ignore) are copied from derived versions to original versions, and extra categories in the original versions are removed (true synchronization).

Derived <-> Original:
All categories (excluding Categories to Ignore) are copied from derived versions to original versions, and original to derived. This mode helps ensure that all categories are found and spread to all versions of the file, but it does not ever remove any categories. NOTE: This mode might copy categories from derived versions to originals requiring a second run of the script to ensure that these new categories in the original are then applied to all derived versions.

Original -> Derived Interactive:
All categories (excluding Categories to Ignore) are copied from original files to derived versions, and extra categories in the edited versions are brought to the attention of the user through a dialog box for resolution (true synchronization). Default.

Copying Properties

Defines the way in which properties are copied between original and derived versions. Also defines how extra properties are handled. Extra properties are defined as properties that have a non-null value in an derived version file that are null in the original file, or as properties that have a non-null value in the original file that are null in the derived versions.

Disabled:
No changes are made to the image properties.

Original -> Derived:
All property values (excluding Properties to Ignore) are copied from original files to derived versions, but extra property values in the derived versions are left alone (ie. even though a property is null in the original, the corresponding non-null value in the derived version is not forced to null). Default.

Original -> Derived Forced:
All property values (excluding Properties to Ignore) are copied from original files to derived versions, and extra property values in the derived versions are cleared (true synchronization).

Derived -> Original:
All property values (excluding Properties to Ignore) are copied from derived versions to original files, but extra property values in the original file are left alone (ie. even though a property is null in the derived version, the corresponding non-null value in the original version is not forced to null).

Derived -> Original Forced:
All property values (excluding Properties to Ignore) are copied from derived versions to original files, but extra property values in the original file are cleared (true synchronization).

Derived <-> Original:
All property values (excluding Properties to Ignore) are copied from derived versions to original versions, and original to derived. This mode helps ensure that all property values are found and spread to all versions of the file, but it does not ever clear (ie. set to null) any properties. NOTE: This mode might copy property values from derived versions to originals requiring a second run of the script to ensure that these property values in the original are then applied to all derived versions. If this occurs, a warning message will be displayed in the log that indicates a second run is recommended.

Searching For Originals

Selects the type of directory structure used to locate the original file from the derived versions.

  • Same Folder as Derived - The original files are located in the same directory as the derived files. Default.

    • For example:
      Derived: E:/Pictures/2004/2004-10-31/20041031_1243-e.jpg
      Original: E:/Pictures/2004/2004-10-31/20041031_1243.jpg
  •  

  • Search and Replace Folder - The original files are located by search-and-replacing part of the path from the derived version's path. This allows several different organizations to work, including: 1) The original files are located in a different root directory, but maintain the same hierarchy below as the derived versions. 2) The derived files are located within a subdirectory below the originals.

    • First example:
      Derived: E:/Pictures/Work/2004/2004-10-31/20041031_1243-e.jpg
      Original: E:/Pictures/Raw/2004/2004-10-31/20041031_1243.jpg

    • Second example:
      Derived: E:/Pictures/2004/2004-10-31/edit/20041031_1243-e.jpg
      Original: E:/Pictures/2004/2004-10-31/20041031_1243.jpg

    In this mode of operation, the script locates the Search String in the derived version's directory path and replaces it with the Replace String.

  •  

  • Subfolder of Derived - The original files are located in a subdirectory below the derived versions. This doesn't seem like a likely arrangement, but support has been provided, just in case.

    • For example:
      Derived: E:/Pictures/2004/2004-10-31/20041031_1243-e.jpg
      Original: E:/Pictures/2004/2004-10-31/original/20041031_1243.jpg

    In this mode of operation, the script adds Subfolder to the derived version's directory path to obtain the original version's path.

  • Anywhere in Database - This is the most flexible search option. Original files can be located anywhere in the database. This mode searches the entire database for matches, based purely on filename, irrespective of directory hierarchy. The only potential disadvantage of this approach is that the searching of files is not deterministic if multiple files exist with the same filename in different folders.
Strict Prefix

Defines whether or not a multi-digit prefix is required to indicate a file is controlled.

Enabled - Files that are controlled by this script must only contain digits in the first Prefix Length characters of the file name and are then followed by the Prefix End Char character.

Disabled - Files that are controlled by this script can have any characters before Derived Version Char. This mode is only practical if you ensure that all the uncontrolled files in your database don't contain the Derived Version Char character (eg. hyphen -).

Prefix Length When Strict Prefix is Enabled, defines the number of digits that are expected at the start of every controlled file name.
Prefix End Char When Strict Prefix is Enabled, defines the character that must follow the digit sequence.
Derived Version Char Identifies the character that seperates a controlled file's name body from the version suffix. Note that if Strict Prefix is Disabled, then this character should not appear in a file name within your database, unless that file is intended to be controlled by the script.
Category for Derived Images

Defines the category path to the derived state used to mark images that are non-originals.

Note: It is expected that the user will need to configure this parameter to match their database setup.

Tag Original Images

If Enable Tag Original is set, then all original files in the database are added to the category specified.

In this mode, the script will add the category (if necessary) to an original file located anywhere in the database. Even if the script is run in Selection mode and the Originals weren't selected, the script will still add the category.

Note: It is expected that the user will need to configure this parameter to match their database setup.

Categories to Ignore

Defines all of the categories that will be ignored when comparing original and edited versions. This is necessary because some tags will depend on the version of file. This is most often used to allow state and rating values to differ between versions of the same file.

Note: It is expected that the user will need to configure this parameter to match their database setup.

Properties to Ignore

Defines all of the properties that will be ignored when comparing original and edited versions. This is necessary because some properties will be dependent upon the version of file. For example, one might have a property field that is used to store the file's CRC value. In this scenario, one probably wouldn't want this value to be synchronized between versions -- and so one would add this property to the properties to ignore list.

Note: It is expected that the user will need to configure this parameter to match their database setup.

Order of Extensions Defines all of the file types / extensions that are searched when trying to locate an original version. This list is searched in order (from first entry until the last), allowing the script to search for RAW / uncompressed file formats first.

For any comments, suggestions or future improvements, please Contact Calvin.


Details for Advanced Users

Strict Prefix

If Strict Prefix is enabled, then the determination of whether or not an image is controlled is done by searching for a prefix (eg. 20010318_) in the filename. A filename prefix is made up of one or more digits (the Prefix Length) followed by a special character (the Prefix End Char). When Strict Prefix mode is enabled, a file named "20010318_1234.jpg" will be identified as a controlled image, whereas "john bday 001.jpg" will not be. In the above example, my Prefix Length is 8 and my Prefix End Char is _. When Strict Prefix mode is disabled, all files are processed (ie. controlled). The advantage of using Strict Prefix mode is that it allows you to limit the script to working on files that you know follow your file naming scheme.

Be aware that if Strict Prefix mode is disabled and you have original filenames that include the Derived Version Char (eg. john-bday 001.jpg), you will cause confusion for the script -- the best solution is to select a Derived Version Char that does not appear in any of your original filenames.

Extra Categories & Properties

As mentioned above, the mechanism of synchronizing categories and properties between original and derived versions can get complicated. If you follow the workflow suggestions listed above, you shouldn't have to worry about these. However, for those of you who are not using the suggested methodology, it is worth understanding the issue.

As the Manage Versions script process each file, it determines whether or not it is a controlled file. If it is controlled and is identified as a derived file, it proceeds to search for the original file. Once the original file has been located, a comparison is performed between the categories associated with the original and the categories associated with the derived version. Most of the time you will want a similar set of categories assigned to both the original and derived versions. Keeping these categories synchronized is the main purpose of Manage Versions.

Usually, it is the case that the original file has been added to more categories than the derived version, but occasionally there exists "extra" categories in a derived version which are not assigned to the original. While this is not usually the case, it is sometimes necessary when dealing with certain categories. For example, let's say that your list of available categories includes ones such as state (edited, web, scanned, etc.) and ratings (excellent, favorite, good, etc.). You might have an image in which the original is OK, but a slightly edited version of this image (a derived file) looks far better. So, you might want to assign the derived file to the Excellent category, but not the original. Manage Versions allows for these special categories by providing a configurable list of categories to ignore when performing the synchronization (see CATEG_IGNORE_LIST below).

The main use for the Manage Versions script is to copy (synchronize) categories and properties from the original images to their derived versions, so it is important that a check is performed to see if these extra categories exist in the derived version. This is best explained by an example. An original photo is tagged with Location.Alaska and the derived image file doesn't have any "location" tags attached. Ideally, this script should simply copy Location.Alaska to the derived / edited version. But now let's say that you discover someday that the photo should have actually been assigned to Location.Egypt and not Location.Alaska!

The average user will simply correct the category assignment on the original version and forget to make the same modification to all images that were derived from this original (eg. maybe additional versions of the photo were made for the web, print, etc.). But here is the problem. IMatch will show that the original photo is assigned to Location.Egypt but the derived versions are still assigned to Location.Alaska.

What should the script do? Ideally, it will recognize that the original's categories have changed and that the derived version's Location.Alaska assignment should now be updated to Location.Egypt. This could be accomplished by removing all categories (outside CATEG_IGNORES) in the derived versions and simply copying over all categories from the original version into the derived versions.

But, the above strategy has a problem: what if the "average user" accidentally fixed the "Location" tag (or added another category) in the derived version instead of the original version? The script would see that the original had Location.Alaska and the derived version with Location.Egypt. With the default copy procedure described above, the derived version's Location tag would be updated to Location.Alaska, undoing the work you did in updating the categories. Therefore, we have to be careful about how these extra categories are handled!

In Manage Versions 1.5 and above, the user has a lot of flexibility in the way that the category and property synchronization is performed (and the handling of the extras). The script can synchronize both the categories and properties between versions. For category copying, one can either copy from Original to Derived versions (default method) or Derived to Original versions. Within each of these two copy modes, there are two variants: normal and forced operation. In normal mode, categories or properties are copied, but if extra categories / properties exist in the destination, they are left untouched. In forced mode, these extra categories / properties are removed first before the copy, in essense allowing true synchronization between versions.

The best way to understand the difference is to understand how the script behaves differently. In forced mode, the script removes all tags from the destination (Derived version when in Original -> Derived mode) and then copies all categories / properties from Original to the Derived versions. In normal mode, the script simply copies all tags from Original to Derived versions. Note that the non-forced mode of operation doesn't remove categories / properties in the destination if they were removed in the source.