Tools that apply Burrows-Wheeler algorithm. Burrows-Wheeler transform (BWT) is an algorithm maximizing repeated letters in a text, which is useful in data compression.
Burrows–Wheeler Transform - dCode
Tag(s) : Compression
dCode is free and its tools are a valuable help in games, maths, geocaching, puzzles and problems to solve every day!
A suggestion ? a feedback ? a bug ? an idea ? Write to dCode!
The Burrows-Wheeler transform (BWT) reorganizes the characters of a message and associates a key with it. This is not a compression, but a preprocessing prior to a compression. The BW transform tends to bring identical characters together, this proximity then allows increased compression (for example by RLE coding).
The first step consists in listing all possible rotations of the message (of the character string).
The second step is sorting this list in alphabetical order.
The ciphered message is constituted of the last letters of each rotation. The associated key is the rank of the original message in the list.
Example: The encrypted message is EEODDC. The key is 2 (DECODE, the original text, is on the line 2 if the table).
dCode accepts only ASCII characters (without line return)
Decryption/Burrows Wheeler Inverse Transformation requires to know the key and the ciphered message (with N characters).
Example: The ciphertext EODC (4 characters) and the key 1
Step A: initialize an empty array with N rows and N columns.
Step B: write the encrypted message in the last empty column of the table
Step C: sort the rows of the table in alphabetical order
Repeat steps B and C as many times as there are letters in the message.
Example: State of the table after each step:
When finished, the plaintext is at the line number key of the table.
Example: At the row 1, after the last step of the algorithm, is the plain message: CODE
The key is not really important for intelligible text, because when decrypting, all lines of the table are in fact rotations of the original text.
BWT can be used without a key, but in this case, a unique character of the original text and its position are needed, for instance in computer EOF character is used for last one.
Several implementations are possible but the best ones are in $ O(n) $ for the duration and $ O(n \log \sigma) $ (or even better) for the memory. With $ n $ the input size and $ \sigma $ the size of the alphabet.
In 1994 by Michael Burrows and David Wheeler