Menu

Friday, March 29, 2013

Fixing filename encoding using a PHP console script

Recently i had received an archive with files required for a project. It was awful. The filenames were in ciryllic and on top of that the encoding was broken. So what to do when something like this happens? I wrote a little php script that walks all directories and files and renames the files with the appropriate encoding:



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
<?php
const FromEncoding = 'utf-8';
const ToEncoding = 'cp1251';
 
 
function convertEncoding($filename) {
    return mb_convert_encoding($filename, ToEncoding, FromEncoding);
}
 
function traverse($dir) {
    $handler = opendir($dir);
  
    if(!$handler) {
        echo ("\n* * *\tError: Cannot open directory ($dir)\n");
        return 0;
    }
  
    echo "* * *\tENTERING: " . $dir . PHP_EOL;
  
    while($filename = readdir($handler)) {
        if($filename != '.' && $filename != '..') {
            echo $dir . '/' . $filename;
          
            $newName = convertEncoding($filename);
          
            rename($dir . '/' . $filename, $dir . '/' . $newName);
          
            echo "\n\t=>\t" . $newName . PHP_EOL;
          
            if(is_dir($dir . '/' . $newName)) {
                traverse($dir . '/' . $newName);
            }
        }
    }
}
 
if(is_dir($argv[1])) {
    chdir($argv[1]);
    traverse(getcwd());
} else {
    die("\n* * *\tError: Cannot open directory ($argv[1])\n");
}
 
echo "\n* * * DONE * * *\n";
?>


After that just open a terminal window and write:

1
php mb_conv.php /PATH


After that i found a really good tool that does the same thing, it's called convmv.
If you are using Ubuntu you can open a terminal window and type:
1
2
sudo apt-get install convmv
convmv --notest -r -f cp1251 -t utf8 DIR


Hope that helps someone out there :).

No comments:

Post a Comment