Menu

Friday, March 29, 2013

Fixing filename encoding using a PHP console script

Recently i had received an archive with files required for a project. It was awful. The filenames were in ciryllic and on top of that the encoding was broken. So what to do when something like this happens? I wrote a little php script that walks all directories and files and renames the files with the appropriate encoding:



<?php
const FromEncoding = 'utf-8';
const ToEncoding = 'cp1251';


function convertEncoding($filename) {
    return mb_convert_encoding($filename, ToEncoding, FromEncoding);
}

function traverse($dir) {
    $handler = opendir($dir);
 
    if(!$handler) {
        echo ("\n* * *\tError: Cannot open directory ($dir)\n");
        return 0;
    }
 
    echo "* * *\tENTERING: " . $dir . PHP_EOL;
 
    while($filename = readdir($handler)) {
        if($filename != '.' && $filename != '..') {
            echo $dir . '/' . $filename;
         
            $newName = convertEncoding($filename);
         
            rename($dir . '/' . $filename, $dir . '/' . $newName);
         
            echo "\n\t=>\t" . $newName . PHP_EOL;
         
            if(is_dir($dir . '/' . $newName)) {
                traverse($dir . '/' . $newName);
            }
        }
    }
}

if(is_dir($argv[1])) {
    chdir($argv[1]);
    traverse(getcwd());
} else {
    die("\n* * *\tError: Cannot open directory ($argv[1])\n");
}

echo "\n* * * DONE * * *\n";
?>


After that just open a terminal window and write:

php mb_conv.php /PATH


After that i found a really good tool that does the same thing, it's called convmv.
If you are using Ubuntu you can open a terminal window and type:
sudo apt-get install convmv
convmv --notest -r -f cp1251 -t utf8 DIR


Hope that helps someone out there :).

No comments:

Post a Comment