August 2007

"We just wanted to get our work out there and get people to use it. It would have been very difficult otherwise for two university students in Israel to get millions of users without having millions of dollars behind us."

Andi Gutmans, co-creator of PHP

This article provides an example of how a graduate student in Ontario used open source software and freely available data to solve a technical dilemma, start and grow a business, and provide services which benefit many. It also illustrates how easily new features and customizations can be developed when an API is made available to its users.

Why a

The API behind was first written in July 2004 to overcome the technical hurdles encountered when trying to organize business listings based on location and searches by proximity to a given place. At the time, Google Maps and Google Local were not available and mash-ups weren't mature enough to meet my needs.

Upon review of commercial applications it became evident that there was little choice and that the products available were expensive, incomplete or inaccurate. The least expensive of these commercial offerings came at a cost of $0.25 per query, becoming increasingly prohibitive as the amount of queries grows. Some of the functionality required was just not available. For example, there was no ability to enter a point in latitude/longitude and get back the description of the closest location to that point.

Commercialization efforts commenced in early 2006 when I realized the software had the potential to provide a much-needed income while studying for a graduate degree. became the core product for many applications, the first being, a searchable online database of Canadian restaurants.

From a technical standpoint, a geocoder is software which extracts named entities such as civic addresses, intersections, and city names from an input string, then matches these entities to an existing database of physical locations to provide a suitable answer in the form of a geographically encoded location. It is a crucial component of the local web 2.0 space as it is the location intelligence behind the content. The geocoder must map a location typed by a human into a cartographically defined point expressed in latitude and longitude.

Relevancy is what differentiates web 2.0 from web 1.0. There has been a lot of innovation by web 2.0 companies aimed at providing the most relevant information to their netizens, and since the amount of information in the global Internet has grown exponentially, information proximity to a geographical location is an important dimension of relevancy. Those web 2.0 sites that offer their visitors the capability to search for "What" (the content) as well as "Where" (the proximity context) are at a competitive advantage to attract back users seeking information, because the "Where" saves time.

For example, what's the use of finding the best restaurant to dine in, when it is very far from your location? The ideal scenario is for the user to get to their desired information with just one click. The sites providing the most relevant information with the least amount of user effort are the ones that will succeed in the long run.

To achieve these goals, it is important to extract and geocode geographical information from large amounts of content quickly and accurately, as well as to quickly and accurately geocode location queries in real time.

Open Data

The initial obstacle was obtaining accurate data. The government of Canada does offer free geographical information through the Geobase portal and the Statistics Canada website. The same goes for the US government with their free tiger line dataset. However, the quality of the data received for free is not as user friendly as that sold by commercial providers like NavTeq. Free data wins, but at a cost of accuracy to the user.

Most of the free data available is provided in different non-standard formats as most government departments have different purposes for the data they gather. While the census personnel think of locations as blocks or polygons covering inhabited areas that can be processed to produce policy recommendations for tax collection, the postal service views locations as delivery paths. Unifying these datasets is no easy task. Statistical analysis was used to standardize and correct inconsistencies in the raw data sets, making the quality of the resulting geocoder comparable and in some cases better than the already established commercial players.

The remaining task was to build the software. The GML open specification already implemented the basic algorithms for building powerful natural language processing and a MySQL database was used to store the processed datasets.

Customer Value was released on July 5th, 2005 as a free geocoder for the many non-profit and open source projects that require geocoding to build more powerful location based information retrieval systems. The original project which gave rise to the need for having a is still using this technology, and currently serves over a million pageviews a month to netizens seeking local information on food and dining. It is just one of the many examples of how to add value to web sites using geographical information.

The primary requirement for customers using a geocoder is accuracy, followed by the versatility of geocoding functions. has been able to quickly customize software and add unique features to serve the needs of very specific customers and markets. By collaborating with, customers gain competitive advantage by accessing information that is not currently available from competitors. For example, several asset tracking and management companies utilize the reverse feature to find the road that the asset is currently traveling on, as well as computing the speed of the asset using the's relative proximity functions.

Many other free projects have utilized the lightweight XML geocoding port to provide relevant and valuable information on a variety of topics. These include finding and mapping free wireless hot spots in Canadian cities and gathering and analyzing data about pollution.

The list of users keeps growing and there is an increasing number of developers working on new projects with the aim of using better information to improve the quality of life in their local communities.

The next step for is to bring even more accurate and versatile location intelligence to the masses for free. This is to be followed by an expansion into other countries and languages starting with the European Union countries, and providing geocoding of physical landscape and landmarks.

This will require the development of a semantic location search. This field is a subcategory of the semantic web idea, and involves using location intelligence for highly structured search queries such as "How many people live within x distance of the Ottawa river?". The data for answering such questions is available; the tools, however, have not been developed yet.

Paying the Bills

The business model is simple:

  1. if you are a non-for-profit you may use the lightweight XML geocoding port for free
  2. if you are a for-profit entity, you can use the services for a fee. Commercial clients obtain credits for using the XML. Consulting services and support is also available to commercial clients.

This model provides several benefits:

  • a good relationship with the community
  • a good reputation for
  • valuable customer feedback leading to new functionality
  • funding of in house development
  • 100% word-of-mouth marketing

And how successful has this model proven? To date, there are in excess of 700 non-for-profit users and 390 commercial customers. In addition, volunteers have provided free software modules for using the XML port for nearly every major programming language available today. These modules allow users and customers to integrate into their own custom applications and can be found by searching CPAN (for Perl modules) and Google. runs its services on the well established LAMP stack. To date, I have not spent any money on software licenses. The only cost for running the business is the hosting fees, hardware, and my time. Due to the open source ecosystem, a minimal investment has allowed to develop a tool that helps to improve the quality and relevancy of information accessed on the web.

Advances in processing and obtaining relevant information about our world and its physical environment has greatly improved our quality of life. This is partially due to the fact that people have been free to put their ideas at work for the common good. will continue to work towards the implementation of more algorithms that currently only exist in theory in the fields of natural language processing and computational geometry.

Share this article:

Cite this article:

Rate This Content: 
5 votes have been cast, with an average score of 2.4 stars