Google Weather + Character Encodings

I’m building an application for my “Data, Schemas & Applications” module at university, which manipulates data from Google to display the weather for given locations. Google’s weather API is fairly simple to understand, and is run through a REST base service and does not require the use of an API key.

You can use PHP to download a copy of the weather data for your chosen city, and subsequently process it to output the weather in whatever way you choose. Prakash has provided a function which can be used on the University campus for loading files through the proxy. It looks like the following:

function get_file($uri) {

/*********************************************************
* @function: get_file
* @author: Chris Wallace
* @created: 30 November 2009
* @updated: 20 January 2012
* @source: http://www.cems.uwe.ac.uk/~pchatter/php/dsa/dsa_utility.phps
*
* This function will get any file through the UWE proxy.
*
* It has been adapted so that if we
* are running on our local testing server, we do not
* need to use this function, as Ben's private server
* does not have proxy requirements.
*********************************************************/


// Conditional: Do we need to use the proxy?
if(substr($_SERVER['HTTP_HOST'], 4) == 'cems.uwe.ac.uk') { // Conditional @value: Yes

// Create a context for the PHP file_get_contents function
$context = stream_context_create(array('http'=> array('proxy'=>'proxysg.uwe.ac.uk:8080', 'header'=>'Cache-Control: no-cache')));

// Get the contents of the requested URI
$contents = file_get_contents($uri, false, $context);

} else { // Conditional @value: No

// Get the contents of the requres URI without use of the proxy
$contents = file_get_contents($uri, false);

} // End Conditional

// And return the contents of the file
return $contents;

}

We later use that function to get a SimpleXML instance of the Google Weather XML file. However, there’s an issue with using Google’s API and PHP’s SimpleXML class, and that is one of character encodings. Every text file generated by computers uses some kind of character encoding, and it’s used by computers to interpret what certain bytes should be rendered as. Common character sets used in everyday situations include:

  • ASCII – American Standard Code for Information Interchange
  • UTF-8 – UCS Transformation Format, the most common format used on the World Wide Web.
  • ISO/IEC 8859-1 – more commonly known as Latin1, a common choice for storing data in MySQL databases.

As it turns out, Google API returns XML documents encoded in an encoding called “GB_18030,” which is a Chinese government standard, and includes a lot of Latin characters (commonly used in English, French, Italian and other European languages), as well as Chinese, Japanese and other east-Asian characters, thereby allowing the API to work in those countries.

However, PHP’s SimpleXML class expects us to load a file in UTF-8. Because of that, we need to convert our returned XML file into UTF-8; which, as it turns out, is fairly simple.

PHP has a module called “iconv.” This module is able to convert an object or a string’s character encoding. Through the following code we can easily switch the character encoding of Google’s XML file to UTF-8.

// We created the get_file function earlier, so we'll use that here.
$file = get_file($this->weather);

// The following line converts the XML file, $file, from "GB18030" to "utf-8".
// Bear in mind that the information portrayed below is case-sensitive.
$xml = iconv("GB18030", "utf-8", $file);

// Finally we return an SimpleXML object of the re-encoded XML file.
// The @ symbol before the function simplexml_load_string makes sure that
//   if we were given an invalid XML file, it will now throw an error.
$xml = @simplexml_load_string($xml, NULL, LIBXML_NOCDATA);

I hope this explains the issue and the solution in as simple form as possible. However, if you have any questions on how this works, please post them in the comments section below.