API Data Formats You Need To Know

When you are on the internet, acronyms rule. If you frequent some of the social media sites, the acronyms become more prevalent. More importantly, the acronyms become the backbone of some of your favorite sites. Some of the acronyms that you will see are JSON, RSS, and Atom. These things are important because they are used for data transfer and content syndication. Aggregation sites like FriendFeed require the use of these technologies in order to gather the information, like your blog posts, and display them on their site. These technologies are also used heavily when dealing with the API of various social media sites. Why does this matter? Well, what if you want to use this information to create your own site or just a simple mashup? Then you need to know more about them.


The first step is to learn more about RSS. There is a very detailed overview of the history of RSS on Wikipedia. I point you to the history because there are forks in RSS that are important to know about and a few versions that may be used. There are three versions of RSS that you really want to focus on, RSS 0.91, RSS 2.0 and Media RSS. You will want to review the history of RSS and the three versions to determine what you want to generate, or specifics on how to find what data you can expect to see.


Atom is another XML based format, but it was developed specifically as an alternative to RSS. Again, Wikipedia has a very good article on Atom and some other comparisons. The specification for Atom can be found as IETF RFC 4287. The purpose of Atom is best described by the spec itself:

The primary use case that Atom addresses is the syndication of Web content such as weblogs and news headlines to Web sites as well as directly to user agents.

Atom is considered a better format for blog feeds than RSS because it was written with syndicating content as the purpose. RSS is much older and has more general purposes.


JSON can be considered the new kid on the block regardless of how long the technology has been around. It has been only a few years that JSON has really been accepted as a data transfer format. It stands for JavaScript Object Notation. The idea is that you pass data around in what looks like a JavaScript object. For more detail on what this format looks like, review JSON.org for a good overview and always remember to wrap your keys in quotes otherwise Internet Explorer will get very unhappy. Wikipedia also has a nice article regarding JSON, some interesting tidbits and how it compares to some other formats. I highly recommend becoming familiar with JSON as most sites that provide an API are supporting JSON as an output format.


RDF, or Resource Description Framework, is a much different beast. Wikipedia comes to our rescue again with their RDF article. Where RSS is a general format that people are using to “describe” a blog’s feed, RDF is a general format to “describe” almost anything. The main problem with RDF is that many people do not really understand it. The first paragraph of the Wikipedia overview can show why:

Basically speaking, the RDF data model[1] is not different from classic conceptual modeling approaches such as Entity-Relationship or Class diagrams, as it is based upon the idea of making statements about resources, in particular, Web resources, in the form of subject-predicate-object expressions. These expressions are known as triples in RDF terminology.

Thankfully, RDF can be serialized into XML. The RDF for this post could be:

        <rdf:Description rdf:about="https://regulargeek.com/2009/03/24/api-data-formats-you-need-to-know">
            <dc:title>API Data Formats You Need To Know</dc:title>
            <dc:publisher>Rob Diana</dc:publisher>

RDF is fairly popular within the semantic web technologies as it is much more descriptive and convenient than trying to define custom data formats for everything. RDF is already in use in the social media world as well in the form of FOAF (Friend of a Friend). There are many other applications of RDF in the wild, so check the Wikipedia article if you are interested.


XML is the last resort in data transfer formats. Basically, XML is still used because so many people know how to read, parse and use XML in their programming language of choice. The benefit of using a custom XML format is that it is very expressive and flexible. However, there are problems with using XML. Every client that reads your data format needs custom code in order to handle the data transfer. Any changes in the definition of the data require changes on the client side as well.


Another new player is SUP, Simple Update Protocol, that was defined by our friends at FriendFeed. The initial blog post on SUP has a good high level description:

SUP is a simple and compact “ping feed” that web services can produce in order to alert the consumers of their feeds when a feed has been updated. This reduces update latency and improves efficiency by eliminating the need for frequent polling.

I know this just sounds like a simple ping, but they have done something interesting with it. The idea is that a site like FriendFeed has one URL that you can poll. Within the results of that poll request, you could have the SUP-ID of thousands of feeds. Essentially, you get a “meta-feed” that tells you what feeds have changed. It is an interesting idea and it does have some support, but it is not as widely implemented as some of the others.

Anything Else?

I am sure there are many other formats that you could find and try to learn, but these will get you started. For each of these technologies, there are probably freely available libraries in the language of your choice for consuming and producing these formats as well. However, that is probably a topic for another post.

3 thoughts on “API Data Formats You Need To Know

  1. rgeorge28,

    REST is more of a specification of an architecture. Generally speaking, most APIs that web applications have are “RESTful”, but it is also hard to define. If you look at the Wikipedia article on REST (http://en.wikipedia.org/wiki/REST), you will see that it is light on specifics and thus causes some confusion.


Comments are closed.