Home
Parent Directory

namealign

The program namealign renames files in a way that they are equally formatted.

It follows the same idea of character-classification as the tool pftdbns, but does not sort files into directories. Instead it renames files of the same filename-macrostructure to be equally formatted (e.g. inserting a leading 0 in a numbered part of the name).

Name/String, Character Classes, Structure (Micro-/Macrostructure)

In the description of the tool pftdbns I mentioned the term "name structure" or "structure". I now refer to this name structure (string structure) as microstructure. I do this to distinguish it from the macrostructure, which I invented for the namealign-tool.
I will now explain theese and the other terms of use.

In short words: character classes are used to detect the micro-structure of a string. Therefore the microstructure of a string is an abstraction on strings, based on character classes.
Macrostructures of a string are abstractions on the microstructure.

Character Classes

For this tool I use the most common character classes. If you are used to regular expressions, then you might know other character classes also (for example "lowercase" or "alnum"). Because of the special intention of name-alignment (reformatting strings to similar-formatted strings) only the character classes that you can see below, are distinguished.

character(s)character classcharacter class short-name
a..z, A..Zletterl
0..9integeri
space, tab, newlineblankb
dot (".")dotd
slash ("/")slashs
other charactersothero

Micro-Structure of a String

If you map each character of a string to it's character class, and represent each character class by a character that is used as a shortname for the charater class, then you do a mapping like the tool pftdbns does.
When you substitute each character of the input string to a class-representing-character, then this substituted string is the representation of the string's microstructure.
Examples:
Example stringcorresponding microstructuredifferent notation of microstructure
hallo_this_is_an_example.txtlllllollllollollollllllldlll5l1o4l1o2l1o2l1o7l1d3l
my_holiday-pictures-1.jpg llolllllllolllllllloidlll2l1o7l1o8l1o1i1d3l
my_holiday-pictures-2.jpg llolllllllolllllllloidlll2l1o7l1o8l1o1i1d3l
my_holiday-pictures-3.jpg llolllllllolllllllloidlll2l1o7l1o8l1o1i1d3l
my_holiday-pictures-23.jpg llolllllllolllllllloiidlll2l1o7l1o8l1o2i1d3l
my_holiday-pictures-183.jpg llolllllllolllllllloiiidlll2l1o7l1o8l1o3i1d3l

As the tool pftdbns looks at the microstructure, it would group the following files together:

This makes sense, because often files, that are created automatically, are following a certain filename microstructure. So to select files by nameing-structure, often is a helping kind of selection.
In the above example it seemingly will not be a big help, when you want to group together your files. And this is the reason, why we want to do an abstraction on the microstructure and we call it macrostructure.

Macro-Structure of a String

As the above example of the holiday pictures showed us, the microstructure does not help in grouping them together, because the numbering of the pictures was not done with a fixed format.
Because the number of digits in the picture-numbering is not fixed (so that the number of digits grows), they can't be grouped together by the microstructure.
You might argue that this is a reason why the pftdbns tool does stupid things. But when you use a picture viewer to view your files, the order will not be as you might expect, when looking at the numbers.

The order will be:
my_holiday-pictures-1.jpg,
my_holiday-pictures-183.jpg,
my_holiday-pictures-2.jpg,
my_holiday-pictures-23.jpg,
my_holiday-pictures-3.jpg

This might disturb you, when you want to show your pictures to your audience, especially if you discover this chaos at the time you start your presentation and didn't looked at the mess before (so you couldn't fix it by renaming your hundred of files by hand ;-)).

So what we now introduce, is the macrostructure of the string. If we again represent each macrostructure by a letter, but now an uppercase letter, then we have the same characters of the microstructure's lowercase letters as uppercase letters for the macrostructure.

The macrostructure of a string is like the microstructure of a string, with ignored number of successive occurences of equal character classes. So it abstracts away the numbers that we saw in the alternative notation of the microstructure (which we show in the table above).
Or we could say: it's a filter on the micortsructure, that ignores INT's of the microstructure.

The next table shows the string of the filename, the microctructure of that string and the macrostructure.

Example stringcorresponding microstructurerepresentation of macrostructure
hallo_this_is_an_example.txtlllllollllollollollllllldlllLOLOLOLOLDL
my_holiday-pictures-1.jpg llolllllllolllllllloidlllLOLOLOIDL
my_holiday-pictures-2.jpg llolllllllolllllllloidlllLOLOLOIDL
my_holiday-pictures-3.jpg llolllllllolllllllloidlllLOLOLOIDL
my_holiday-pictures-23.jpg llolllllllolllllllloiidlllLOLOLOIDL
my_holiday-pictures-183.jpg llolllllllolllllllloiiidlllLOLOLOIDL

Renaming of the Files

If you now would group your files together and put them into a directory, one directory for each of the macrostructures, then your grouping would be fine.
But when using your picture viewer, you again would have the trouble of the wrong ordered sequence.
So, what we should do now, is renaming the files in a way that they will have the same microstructure.

But before I forget to mention it: We should only rename files to get the same microstructure, if they have the same macrostructue. Otherwise we might run into bigger trouble than before: think about the above examples and what would happen, if we (try to) force all of the above files - independently of the macrostructure - to get the same microstructure.

As you can see, it only makes sense to rename the files that belongs to the same macrostructure.

To have an automatism, that makes it unnecessary for the user to think about the renaming, is a fine thing. But it seems the best to me, to give the user options to change the behaviour of the tool, because an automatism that fits all needs will obviously not be possible.



Options and Default Behaviour

Options (version 0.6, 0.7)

The version 0.7 of namealign has the following options:

-letter letter will also be changed
-na no action: show only what would be done if this flag is NOT set
-minint integers: minimal size (remove leading 0's)
-s single repitition: for all non-alnum chars only replace by one char
-show_all show all: even files that will not be renamed will be shown
-verbose verbose: same as show_all: even files that will not be renamed will be shown
-version show program version and exit
-help Display this list of options
--help Display this list of options

The option minint does NOT shrink the size to the minimum int-size of an individual file.
It uses the minimum size of integers that can be used for all files of the same macrostructure.

Default behaviour (version 0.7)

Character Class Default behaviouraction changed by options
Blank EACH blank
will be substituded by one "_"
with option -s ALL successive blanks will be replaced by ONE "_"
Slash EACH slash
will be substituded by one "_"
with option -s ALL successive slashes will be replaced by ONE "_"
Letter let them as they arewith option -letter add "z" for each missing letter on the right side
Integer add missing leading "0"
(length of int-string is used)
with option -minint throw away as much leading zeros as possible
(First throw away leading zeros in general, then add missing zeros.)

(length of int-string after removing leading zeros is used)
Dot multiple successive Dots
will be replaced by one dot
with option -s ALL successive dots will be replaced by ONE dot only
Other EACH other character
will be replaced by one "_"
with option -s ALL successive other chars will be replaced by ONE "_"

Download

The tool is implemented in the language OCaml and now available for download: namealign download


last time changed: 28th of October 2007
Mail: oliver _at_ first.in-berlin.de