PHP + encoding + readdir
PHP i problemy z kodowaniem znaków
checkCharset
(testString, targetString);Funkcja pozwala znaleźć właściwe kodowanie dla ciągu znaków. Czasami, gdy dane pobierane są z bazy lub z pliku po wstawieniu ich do kodu otrzymujemy dziwne znaczki zamiast znaków lokalnych (np: ą, ę, š). Funkcja zwraca string określający kodowanie lub false jeśli kodowanie nie zostanie rozpoznane.
testString jest ciągiem, który odtrzymujesz z bazy danych lub innego źródła i chcesz poznać jego kodowanie. Zauważ że w przypadku plików zakodowanych w UTF cią nie może zawierać nagłówka BOM.
targetString is string you want to receive.
Funkcja checkCharset()
function checkCharset($testString, $targetString){
$out=false;
$encoding=array("ASCII","ISO-8859-1","ISO-8859-2","ISO-8859-3","ISO-8859-4","ISO-8859-5","ISO-8859-7","ISO-8859-9","ISO-8859-10","ISO-8859-13","ISO-8859-14","ISO-8859-15","ISO-8859-16","KOI8-R","KOI8-U","KOI8-RU","CP1250","CP1251","CP1252","CP1253","CP1254","CP1257","CP850","CP866","Mac Roman","Mac CentralEurope","Mac Iceland","Mac Croatian","Mac Romania","Mac Cyrillic","Mac Ukraine","Mac Greek","Mac Turkish", "Macintosh","ISO-8859-6","ISO-8859-8", "CP1255","CP1256", "CP862","Mac Hebrew","Mac Arabic","EUC-JP", "SHIFT_JIS", "CP932", "ISO-2022-JP", "ISO-2022-JP-2", "ISO-2022-JP-1","EUC-CN","HZ","GBK","GB18030","EUC-TW","BIG5","CP950","BIG5-HKSCS","ISO-2022-CN","ISO-2022-CN-EXT","EUC-KR","CP949","ISO-2022-KR","JOHAB","ARMSCII-8","Georgian-Academy","Georgian-PS","KOI8-T","TIS-620","CP874","MacThai","MuleLao-1","CP1133","VISCII","TCVN","CP1258","HP-ROMAN8","NEXTSTEP","UTF-8","UCS-2","UCS-2BE","UCS-2LE","UCS-4","UCS-4BE","UCS-4LE","UTF-16","UTF-16BE","UTF-16LE","UTF-32","UTF-32BE","UTF-32LE","UTF-7","C99","JAVA","UCS-2-INTERNAL","UCS-4-INTERNAL","CP437","CP737","CP775","CP852","CP853","CP855","CP857","CP858","CP860","CP861","CP863","CP865","CP869","CP1125","CP864","EUC-JISX0213","Shift_JISX0213","ISO-2022-JP-3","TDS565","RISCOS-LATIN1");
foreach($encoding as $v){
if(iconv($v, "utf-8", $testString) === $targetString){$out=$v;}
}
return $out;
}
Przykładowe użycie:
<?php
//wyobraż sobie, że istnieje plik test.txt zawierający znaki "żłóą"
$file = file_get_contents('test.txt');
echo 'Kodowanie pliku to:'
echo checkCharset($file, 'żłóą');
?>
readdir() i znaki lokalne w nazwie pliku
Jeżeli odczytujesz nazwę pliku, zawierającą znaki lokalne, przy pomocy readdir() musisz użyć iconv() i właściwego kodowania. Ale jak odnaleźć właściwe kodowanie, którego powinieneś użyć?
Pobierz ten plik zip. Rozpakuj go na swoim serwerze i uruchom index.php
To archiwum awiera 2 pliki. Pierwszy to index.php ze specjalną wersją funkcji checkCharset(). Drugi to pusty plik tekstowy zawierający kilka znaków lokalnych w nazwie.
PHP and some encoding problem
checkCharset
(testString, targetString);This function helps to find proper string encoding. Sometimes you get some data from database or file and after puting it into code you get strange signs instead of local chars (eg: ą, ę, š). Function returns string containing encoding type or false if encoding cannot be found.
testString is a string you received from database or other source and want to find out how it is encoded. Please, notice that utf encoded file cannot contain BOM.
targetString is string you want to receive.
Function checkCharset()
function checkCharset($testString, $targetString){
$out=false;
$encoding=array("ASCII","ISO-8859-1","ISO-8859-2","ISO-8859-3","ISO-8859-4","ISO-8859-5","ISO-8859-7","ISO-8859-9","ISO-8859-10","ISO-8859-13","ISO-8859-14","ISO-8859-15","ISO-8859-16","KOI8-R","KOI8-U","KOI8-RU","CP1250","CP1251","CP1252","CP1253","CP1254","CP1257","CP850","CP866","Mac Roman","Mac CentralEurope","Mac Iceland","Mac Croatian","Mac Romania","Mac Cyrillic","Mac Ukraine","Mac Greek","Mac Turkish", "Macintosh","ISO-8859-6","ISO-8859-8", "CP1255","CP1256", "CP862","Mac Hebrew","Mac Arabic","EUC-JP", "SHIFT_JIS", "CP932", "ISO-2022-JP", "ISO-2022-JP-2", "ISO-2022-JP-1","EUC-CN","HZ","GBK","GB18030","EUC-TW","BIG5","CP950","BIG5-HKSCS","ISO-2022-CN","ISO-2022-CN-EXT","EUC-KR","CP949","ISO-2022-KR","JOHAB","ARMSCII-8","Georgian-Academy","Georgian-PS","KOI8-T","TIS-620","CP874","MacThai","MuleLao-1","CP1133","VISCII","TCVN","CP1258","HP-ROMAN8","NEXTSTEP","UTF-8","UCS-2","UCS-2BE","UCS-2LE","UCS-4","UCS-4BE","UCS-4LE","UTF-16","UTF-16BE","UTF-16LE","UTF-32","UTF-32BE","UTF-32LE","UTF-7","C99","JAVA","UCS-2-INTERNAL","UCS-4-INTERNAL","CP437","CP737","CP775","CP852","CP853","CP855","CP857","CP858","CP860","CP861","CP863","CP865","CP869","CP1125","CP864","EUC-JISX0213","Shift_JISX0213","ISO-2022-JP-3","TDS565","RISCOS-LATIN1");
foreach($encoding as $v){
if(iconv($v, "utf-8", $testString) === $targetString){$out=$v;}
}
return $out;
}
Examlple of use:
<?php
//imagine that you have file test.txt containing chars "żłóą"
$file = file_get_contents('test.txt');
echo 'File encoding is:'
echo checkCharset($file, 'żłóą');
?>
readdir() and local chars in file name
When you get file name using readdir() you have to use iconv() and proper encoding. But how to find out what encoding you sholud use?
Download this zip archive. Unzip it on you server and run index.php
This archive contains 2 files. First is index.php with special version of checkCharset() function. Second is empty txt file with some local chars in its name.
Sorry for my english. If you can help me by correct errors please contact me and write what have to be improved