Comments on: PHP, XML, and Unicode https://quoderat.megginson.com/2006/03/01/php-xml-and-unicode/ Open information and technology. Sat, 04 Mar 2006 01:46:04 +0000 hourly 1 http://wordpress.com/ By: david https://quoderat.megginson.com/2006/03/01/php-xml-and-unicode/#comment-422 Sat, 04 Mar 2006 01:46:04 +0000 http://www.megginson.com/blogs/quoderat/archives/2006/03/01/php-xml-and-unicode/#comment-422 Thanks Aristotle — WordPress’s new GUI editor was mangling my postings badly, and I figured out how to disable it halfway through making the posting. I have no idea why it changed by hrefs, but I fixed them by hand.

]]>
By: Aristotle Pagaltzis https://quoderat.megginson.com/2006/03/01/php-xml-and-unicode/#comment-421 Fri, 03 Mar 2006 23:23:55 +0000 http://www.megginson.com/blogs/quoderat/archives/2006/03/01/php-xml-and-unicode/#comment-421 Meta note: for some reason, most of your links have xhref instead of href attributes, and in the one tag where the attribute is spelled href its value is empty.

]]>
By: david https://quoderat.megginson.com/2006/03/01/php-xml-and-unicode/#comment-420 Wed, 01 Mar 2006 22:48:29 +0000 http://www.megginson.com/blogs/quoderat/archives/2006/03/01/php-xml-and-unicode/#comment-420 Thanks for the info, Jirka. phpinfo() shows versions for both libxml and expat with PHP4, and libxml and libxml2 for PHP5.

]]>
By: Jirka Kosek https://quoderat.megginson.com/2006/03/01/php-xml-and-unicode/#comment-419 Wed, 01 Mar 2006 22:46:04 +0000 http://www.megginson.com/blogs/quoderat/archives/2006/03/01/php-xml-and-unicode/#comment-419 And one additional note. If you are using XML under PHP5 it is possible to read documents in any encoding supported by libxml2. AFAIK libxml2 uses iconv for encoding handling, so you can load documents in virtually any encoding, including iso-8859-x, windows-125x and so on.

]]>
By: Jirka Kosek https://quoderat.megginson.com/2006/03/01/php-xml-and-unicode/#comment-418 Wed, 01 Mar 2006 22:42:38 +0000 http://www.megginson.com/blogs/quoderat/archives/2006/03/01/php-xml-and-unicode/#comment-418 [2] You can see which XML library is actually used in phpinfo() output in “xml” section.

Authors of XML extensions in PHP5 carefully modelled behaviour of xml_ functions using new underlying library. This is good for backward compatibility, OTOH some problems were transfered to the new API (e.g. see http://www.codecomments.com/archive222-2005-9-598406.html).

]]>
By: John Cowan https://quoderat.megginson.com/2006/03/01/php-xml-and-unicode/#comment-417 Wed, 01 Mar 2006 20:17:51 +0000 http://www.megginson.com/blogs/quoderat/archives/2006/03/01/php-xml-and-unicode/#comment-417 No multi-byte UTF-8 sequence can contain an ASCII character — that’s one of the design points of UTF-8. So you are taking precautions against a problem that doesn’t exist. (It does exist in UTF-16, however.)

]]>
By: david https://quoderat.megginson.com/2006/03/01/php-xml-and-unicode/#comment-416 Wed, 01 Mar 2006 18:30:24 +0000 http://www.megginson.com/blogs/quoderat/archives/2006/03/01/php-xml-and-unicode/#comment-416 Are you certain, Jirka, that the old xml_parser_create() interface isn’t still using Expat? If not, then I’m especially impressed that my script gives byte-for-byte identical output with PHP4 and PHP5.

]]>
By: Jirka Kosek https://quoderat.megginson.com/2006/03/01/php-xml-and-unicode/#comment-415 Wed, 01 Mar 2006 18:00:24 +0000 http://www.megginson.com/blogs/quoderat/archives/2006/03/01/php-xml-and-unicode/#comment-415 XML support in PHP5 is completely reworked and it is using libxml2 as its base, not expat.

If you want to work with XML seriously in PHP, you need at least version 5.1. Former versions were missing critical features like ability to bind prefixes to namespaces for XPath evaluation and so on.

PHP doesn’t support Unicode, it treats strings as a sequence of bytes. So you are responsible for correct string operations. This can be overcome using mb_string library. This library can make many PHP functions utf-8 aware.

Even in PHP 5.1 there are some unresolved issues:

SAX like parser — doesn’t report all XML events (compared to original Java SAX2); doesn’t have OO interface — handlers are just plain functions

SimpleXML (simple XML2OO mapping) — doesn’t support mixed content; namespaces are supported in a very inconvenient way

XMLReader (pull parser) — is missing several critical methods, including readString()

Due to missing Unicode support and some problems in XML APIs PHP is still far beyond Java and .NET in XML support.

]]>