<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Regular Geek &#187; Miscellaneous</title>
	<atom:link href="http://regulargeek.com/category/miscellaneous/feed/" rel="self" type="application/rss+xml" />
	<link>http://regulargeek.com</link>
	<description>Where programming, the internet and social media collide.</description>
	<lastBuildDate>Fri, 11 May 2012 12:56:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<cloud domain='regulargeek.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Happy 4th Birthday To Regular Geek</title>
		<link>http://regulargeek.com/2011/12/12/happy-4th-birthday-to-regular-geek/</link>
		<comments>http://regulargeek.com/2011/12/12/happy-4th-birthday-to-regular-geek/#comments</comments>
		<pubDate>Mon, 12 Dec 2011 13:00:27 +0000</pubDate>
		<dc:creator>Rob Diana</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Geek Reading]]></category>
		<category><![CDATA[google reader]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://regulargeek.com/?p=3957</guid>
		<description><![CDATA[I had no idea when I started this blog that it would last past a few months. Now, four years later, I am still blogging albeit a little slower lately. This year saw a typical blogging funk late in the year, mostly due to a lack of time, but also due to fewer big announcements [...]]]></description>
			<content:encoded><![CDATA[<p>I had no idea when I started this blog that it would last past a few months. Now, four years later, I am still blogging albeit a little slower lately. This year saw a typical blogging funk late in the year, mostly due to a lack of time, but also due to fewer big announcements and more smaller product iterations.</p>
<p>So, the top 11 posts  written this year were:</p>
<ol>
<li><a href="http://regulargeek.com/2010/12/11/9-programming-languages-to-watch-in-2011/" target="_blank">9 Programming Languages To Watch In 2011</a></li>
<li><a href="http://regulargeek.com/2011/02/02/tradition-programming-language-job-trends-february-2011/" target="_blank">Traditional Programming Language Job Trends – February 2011</a></li>
<li><a href="http://regulargeek.com/2011/02/09/web-scripting-programming-language-job-trends-february-2011/" target="_blank">Web &amp; Scripting Programming Language Job Trends – February 2011</a></li>
<li><a href="http://regulargeek.com/2011/06/27/5-jquery-scripts-to-create-a-great-first-impression/" target="_blank">5 jQuery Scripts To Create a Great First Impression</a></li>
<li><a href="http://regulargeek.com/2011/02/10/twitter-stops-whitelisting-applications/" target="_blank">Twitter Stops Whitelisting Applications</a></li>
<li><a href="http://regulargeek.com/2011/06/29/google-plus-looks-good-but-needs-an-application-platform/" target="_blank">Google Plus Looks Good But Needs An Application Platform</a></li>
<li><a href="http://regulargeek.com/2011/08/03/traditional-programming-language-job-trends-august-2011/" target="_blank">Traditional Programming Language Job Trends – August 2011</a></li>
<li><a href="http://regulargeek.com/2011/07/25/13-free-gtd-online-tools-for-mac-windows-or-linux/" target="_blank">13 Free GTD Online Tools For Mac Windows OR Linux</a></li>
<li><a href="http://regulargeek.com/2011/05/01/as-a-software-engineer-do-you-really-like-your-job/" target="_blank">As A Software Engineer, Do You Really Like Your Job?</a></li>
<li><a href="http://regulargeek.com/2011/03/16/google-sites-becomes-a-real-sharepoint-competitor/" target="_blank">Google Sites Becomes A Real SharePoint Competitor</a></li>
<li><a href="http://regulargeek.com/2011/07/20/36-resources-to-help-you-teach-kids-programming/" target="_blank">36 Resources To Help You Teach Kids Programming</a></li>
</ol>
<p>Why 11 posts? Well, the top post was written one day before the anniversary last year and most of the traffic occurred after the anniversary. The job trends posts continue to have a strong showing regardless of when they were written. The social posts still tend to appear but they are more programming related as well. This continues a <a href="http://regulargeek.com/2010/12/12/happy-3rd-birthday-to-regular-geek/" target="_blank">trend from last year</a>.</p>
<p>Other posts that were very popular from previous years:</p>
<ul>
<li><a href="http://regulargeek.com/2009/02/11/what-programming-language-should-i-learn/" target="_blank">What Programming Language Should I Learn?</a></li>
<li><a href="http://regulargeek.com/2010/05/29/25-free-google-analytics-alternatives/" target="_blank">25 Free Google Analytics Alternatives</a></li>
<li><a href="http://regulargeek.com/2010/02/02/traditional-programming-language-job-trends-february-2010/" target="_blank">Traditional Programming Language Job Trends – February 2010</a></li>
<li><a href="http://regulargeek.com/2010/08/02/traditional-programming-job-trends-august-2010/" target="_blank">Traditional Programming Language Job Trends – August 2010</a></li>
<li><a href="http://regulargeek.com/2010/10/26/google-sites-automation-with-apps-script/" target="_blank">Google Sites Automation With Apps Script</a></li>
<li><a href="http://regulargeek.com/2010/08/18/web-scripting-programming-language-job-trends-august-2010/" target="_blank">Web &amp; Scripting Programming Language Job Trends – August 2010</a></li>
<li><a href="http://regulargeek.com/2010/08/07/12-things-a-programmer-really-needs-to-know/" target="_blank">12 Things A Programmer Really Needs To Know</a></li>
<li><a href="http://regulargeek.com/2008/02/20/7-resume-tips-for-a-software-developer/" target="_blank">7 Resume Tips For a Software Developer</a></li>
<li><a href="http://regulargeek.com/2009/04/02/why-is-data-so-important/" target="_blank">Why Is Data So Important?</a></li>
</ul>
<p>My total statistics at the end of year three show that I was already writing long posts:</p>
<ul>
<li>Months Blogging: 36</li>
<li>Posts Per Month: 11 (400 posts total)</li>
<li>Words Per Post: 851</li>
<li>Total Words In Posts: 339528</li>
</ul>
<p>I continue to go against common wisdom and write posts that are even longer:</p>
<ul>
<li>Months Blogging: 48</li>
<li>Posts Per Month: 12.3 (180 posts this year, and 580 posts total)</li>
<li>Words Per Post: 887</li>
<li>Total Words In Posts: 514328 (174,800 words this year)</li>
</ul>
<p>This year saw the introduction of my list posts (93 so far) in the <a href="http://regulargeek.com/category/geek-reading/" target="_blank">Geek Reading category</a>, and the subsequent demise of those posts with the changes in Google Reader. I do hope to restart those posts when I can determine the best way to implement them. Ignoring the Geek Reading posts, my blogging was technically slower than last year, with 87 posts or just over 7 posts per month. I continued the trend of writing far too many words per post, but people keep reading.</p>
<p>The other side of the blog statistic picture is the external statistics. This past year, the blog had solid traffic growth and good subscriber numbers. <a title="Technorati" href="http://technorati.com/" rel="homepage">Technorati</a> continues to change how their rankings work where my ranking in Technology and InfoTech seems to range from 500 to 5000 and 100 to 2000 respectively. Google changed how their search engine worked again, and that seems to positively impacted traffic to the blog. Feedburner finally dropped FriendFeed from the subscriber numbers, and it recently peaked at 3010 subscribers.</p>
<p>The social landscape continues to change as much of the social sharing has been wildly different than previous years. I no longer see traffic from Digg or Google Buzz, but I now see traffic from DZone, LinkedIn and Google+. Facebook and Twitter look to be mainstays in social for any blog.</p>
<p>As always, thank you for your continued reading and social sharing. Hopefully, this blog will continue to be useful to you.</p>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=7a1a9011-c1f1-4fdb-9890-3b8a024440de" alt="Enhanced by Zemanta" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://regulargeek.com/2011/12/12/happy-4th-birthday-to-regular-geek/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Changes Are Coming</title>
		<link>http://regulargeek.com/2011/07/28/changes-are-coming/</link>
		<comments>http://regulargeek.com/2011/07/28/changes-are-coming/#comments</comments>
		<pubDate>Thu, 28 Jul 2011 13:53:30 +0000</pubDate>
		<dc:creator>Rob Diana</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[google reader]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[jquery]]></category>
		<category><![CDATA[Web application development]]></category>

		<guid isPermaLink="false">http://regulargeek.com/?p=3441</guid>
		<description><![CDATA[Changes are coming to RegularGeek! These are not really big changes, but they are something. Because some of the changes affect the content on this blog, I wanted to give you notice of what is coming. First, I hope to finally update the blog with a new logo. I have had the logo for a [...]]]></description>
			<content:encoded><![CDATA[<p>Changes are coming to RegularGeek! These are not really big changes, but they are something. Because some of the changes affect the content on this blog, I wanted to give you notice of what is coming.</p>
<p>First, I hope to finally update the blog with a new logo. I have had the logo for a while, but I need to tweak the design a bit before I update everything. There will probably be category and archive changes soon as well. The categories are not detailed enough, and will move closer to post tags. So, if you have tried to follow only the programming posts or just the social media posts, you may need to take a look at the new categories to see what to follow. The archives have been an annoyance of mine for quite some time. A simple monthly list works fine when you are under 18 months, but after that the list gets too lengthy. So, I will be looking into new ways to navigate the archives. Any recommendations on these items or the navigation are welcome.</p>
<p>Second, there will be a new type of daily post. I have <a title="Helping With Digital Curation" href="http://regulargeek.com/2010/07/27/helping-with-digital-curation/" target="_blank">talked about this</a> in the past, but I will be posting a daily links post. These posts will be in a new category called Geek Reading, and will contain <a href="http://www.google.com/reader/shared/robdiana" target="_blank">my daily Google Reader shares</a>. I have debated doing this for quite some time, but not enough people use <a href="http://reader.google.com/" target="_blank">Google Reader</a> to get those shares. So, a daily link post will get my &#8220;curated and filtered&#8221; choices published to all of the blog readers. For those of you not familiar with my sharing, there are about 35 posts per day taken from tech news, application development and a little bit of business. These posts will be published around noon Eastern time.</p>
<p>Lastly, I will be posting more about HTML5. HTML5 is not just <a class="zem_slink" title="HTML" href="http://en.wikipedia.org/wiki/HTML" rel="wikipedia">HTML</a>, as it also contains CSS3 and a whole bunch of new <a class="zem_slink" title="Application programming interface" href="http://en.wikipedia.org/wiki/Application_programming_interface" rel="wikipedia">APIs</a>. Of course, when you are dealing with APIs that means you are writing <a class="zem_slink" title="JavaScript" href="http://en.wikipedia.org/wiki/JavaScript" rel="wikipedia">JavaScript</a>. With JavaScript, I will probably focus on core JavaScript and some <a class="zem_slink" title="JQuery" href="http://jquery.com/" rel="homepage">JQuery</a> as that is what I normally use, though I could talk about other interesting libraries at times. More importantly, I wanted to ensure that you get content that talks about all aspects of web application development, not just server-side programming. I am more of a generalist when it comes to application development, so it makes sense for the content to deal with more aspects of web development.</p>
<p>So, that is what is changing here. Hopefully, the majority of readers will like the changes. As always, let me know what you think in the comments.</p>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=6fb2141d-9c37-43d1-8c4f-3cf685061d72" alt="Enhanced by Zemanta" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://regulargeek.com/2011/07/28/changes-are-coming/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Guest Posts on RegularGeek!</title>
		<link>http://regulargeek.com/2011/03/06/guest-posts-on-regulargeek/</link>
		<comments>http://regulargeek.com/2011/03/06/guest-posts-on-regulargeek/#comments</comments>
		<pubDate>Sun, 06 Mar 2011 14:00:14 +0000</pubDate>
		<dc:creator>Rob Diana</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[guest post]]></category>

		<guid isPermaLink="false">http://regulargeek.com/?p=2990</guid>
		<description><![CDATA[For quite some time now, I have been getting requests for guest posts on this blog. Well, I finally crafted guest post guidelines, so guest posts are now being accepted! If you are interested in guest posting, please review those guidelines. There are a lot of things that are important with guest posts, like what kind [...]]]></description>
			<content:encoded><![CDATA[<p>For quite some time now, I have been getting requests for guest posts on this blog. Well, I finally crafted <a title="Guest Post Guidelines" href="http://regulargeek.com/guest-post-guidelines/" target="_blank">guest post guidelines</a>, so guest posts are now being accepted! If you are interested in guest posting, please review those guidelines. There are a lot of things that are important with guest posts, like what kind of formatting is expected as well as what you should be posting about. Those are fairly well covered in the guidelines.</p>
<p>There are some things that I did want to highlight here as well. First, I will not accept posts that are really just one big advertisement. I am trying to help people get more content, and advertisements do not really help anyone. In addition, the posts need to be relevant to the same sort of content that I normally post, so it should be about social media, programming or something like that.</p>
<p>Because I still write somewhat frequently, I will only be posting one guest post per week. I need to see how well this process will work, but that is how this will start. I am very particular about what content appears on this blog, so I also reserve the right to stop accepting posts or changing the guidelines at any time. Another type of post I will not accept are news-related posts. Basically, I can not promise to publish the guest post in a timely manner, so the posts can not be time sensitive.</p>
<p>With all of that said, I hope that you will submit a guest post for consideration.</p>
<p>The first post will be appearing on Monday, so keep watch on the blog for some new writers!</p>
]]></content:encoded>
			<wfw:commentRss>http://regulargeek.com/2011/03/06/guest-posts-on-regulargeek/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Happy 3rd Birthday To Regular Geek</title>
		<link>http://regulargeek.com/2010/12/12/happy-3rd-birthday-to-regular-geek/</link>
		<comments>http://regulargeek.com/2010/12/12/happy-3rd-birthday-to-regular-geek/#comments</comments>
		<pubDate>Sun, 12 Dec 2010 13:51:52 +0000</pubDate>
		<dc:creator>Rob Diana</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[Google Buzz]]></category>
		<category><![CDATA[google reader]]></category>
		<category><![CDATA[linkedin]]></category>
		<category><![CDATA[pagerank]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://regulargeek.com/?p=2681</guid>
		<description><![CDATA[In this rare double-post weekend, it is time to celebrate the 3rd birthday of this blog. For a short time this year, I wondered whether the blog could continue as work continued to demand my time. There were periods of blogging funk, typically every late summer, when there is little news to talk about and [...]]]></description>
			<content:encoded><![CDATA[<p>In this rare double-post weekend, it is time to celebrate the 3rd birthday of this blog. For a short time this year, I wondered whether the blog could continue as work continued to demand my time. There were periods of blogging funk, typically every late summer, when there is little news to talk about and just as little software development changes. Thankfully, I find blogging fun and an outlet for ideas, so it continues into the coming new year. Conveniently enough, this is also post 400, another simple milestone. I would like to thank all of my readers for making it through the year.</p>
<p>So, the top 10 posts written this year were:</p>
<ol>
<li><a href="http://regulargeek.com/2010/08/07/12-things-a-programmer-really-needs-to-know/" target="_blank">12 Things A Programmer Really Needs To Know</a></li>
<li><a href="http://regulargeek.com/2010/02/02/traditional-programming-language-job-trends-february-2010/" target="_blank">Traditional Programming Language Job Trends &#8211; February 2010</a></li>
<li><a href="http://regulargeek.com/2010/04/04/android-is-apples-burger-king/" target="_blank">Android Is Apple&#8217;s Burger king</a></li>
<li><a href="http://regulargeek.com/2010/03/01/web-2-0-programming-language-job-trends-february-2010/" target="_blank">Web 2.0 Programming Language Job Trends &#8211; February 2010</a></li>
<li><a href="http://regulargeek.com/2010/05/29/25-free-google-analytics-alternatives/" target="_blank">25 Free Google Analytics Alternatives</a></li>
<li><a href="http://regulargeek.com/2010/05/23/android-vs-apple-we-have-seen-this-war-before/" target="_blank">Android vs. Apple, We Have Seen This War Before</a></li>
<li><a href="http://regulargeek.com/2010/08/18/web-scripting-programming-language-job-trends-august-2010/" target="_blank">Web Scripting Programming Language Job Trends &#8211; August 2010</a></li>
<li><a href="http://regulargeek.com/2010/08/02/traditional-programming-job-trends-august-2010/" target="_blank">Traditional Programming Language Job Trends &#8211; August 2010</a></li>
<li><a href="http://regulargeek.com/2010/10/26/google-sites-automation-with-apps-script/" target="_blank">Google Sites Automation With Apps Script</a></li>
<li><a href="http://regulargeek.com/2010/01/07/get-paid-for-your-likes-with-mylikes/" target="_blank">Get Paid For Your Likes With MyLikes</a></li>
</ol>
<p>The job trends posts had a very strong showing overall, and has become an ongoing series of posts. In particular, I will be looking more at trends in specific areas of technology over the course of the next year. There are some interesting technologies like the <a class="zem_slink" title="NoSQL" rel="wikipedia" href="http://en.wikipedia.org/wiki/NoSQL">NoSQL</a> datastores and HTML5 that look to make 2011 very interesting from the software development perspective.</p>
<p>Other posts that were very popular from previous years:</p>
<ul>
<li><a href="http://regulargeek.com/2009/02/11/what-programming-language-should-i-learn/" target="_blank">What Programming Language Should I Learn</a></li>
<li><a href="http://regulargeek.com/2009/04/17/tools-for-unit-testing-java-web-applications/" target="_blank">Tools For Unit Testing Java Web Applications</a></li>
<li><a href="http://regulargeek.com/2009/07/21/what-programming-languages-do-jobs-require/" target="_blank">What Programming Lanugages Do Jobs Require</a></li>
<li><a href="http://regulargeek.com/2009/04/02/why-is-data-so-important/" target="_blank">Why Is Data So Important</a></li>
<li><a href="http://regulargeek.com/2009/05/14/the-social-network-business-plan-strategies-from-david-silver/" target="_blank">The Social Network Business Plan: Strategies From David Silver</a></li>
<li><a href="http://regulargeek.com/2008/02/20/7-resume-tips-for-a-software-developer/" target="_blank">7 Resume Tips For A Software Developer</a></li>
<li><a href="http://regulargeek.com/2009/08/19/traditional-programming-language-job-trends/" target="_blank">Traditional Programming Language job Trends &#8211; August 2009</a></li>
<li><a href="http://regulargeek.com/2009/08/25/web-2-0-programming-language-job-trends/" target="_blank">Web 2.0 Programming Language job Trends &#8211; August 2009</a></li>
</ul>
<p>My total statistics at the end of year two show that I was already writing long posts:</p>
<ul>
<li>Months Blogging: 24</li>
<li>Posts Per Month: 12.4 (154 posts in 2009)</li>
<li>Words Per Post: 777 (914 words per post in 2009)</li>
<li>Total Words In Posts: 231402 (140,730 words in 2009)</li>
</ul>
<p>I continue to go against common wisdom and write posts that are even longer:</p>
<ul>
<li>Months Blogging: 36</li>
<li>Posts Per Month: 11 (100 posts this year)</li>
<li>Words Per Post: 851 (over 1000 words per post! tl;dr)</li>
<li>Total Words In Posts: 339528 (108,126 words this year)</li>
</ul>
<p>The other side of the blog statistic picture is the external statistics. Last year, the blog had some nice growth and good subscriber numbers. <a class="zem_slink" title="Technorati" rel="homepage" href="http://technorati.com">Technorati</a> continues to change how their rankings work, and Google has tried to lessen the importance on <a class="zem_slink" title="PageRank" rel="wikipedia" href="http://en.wikipedia.org/wiki/PageRank">PageRank</a>, so I am really only left with Feedburner subscribers which have increased by about 1000.</p>
<p>Obviously, the blog has grown considerably again this year. Much of the growth is due to the various social services that are popular like Google Reader, Google Buzz, Digg, Twitter and Facebook. Even <a class="zem_slink" title="LinkedIn" rel="homepage" href="http://www.linkedin.com">LinkedIn</a> is starting to emphasize the sharing links more.</p>
<p>So, thank you to all of you. I will not make promises about making my posts shorter, I do not seem to be able to make a point in less than 800 words. This will likely get even worse as I get more into analysis of technology trends and job trends. I do hope you stick around as the information will still be as useful as ever, or not but please keep reading.</p>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=608fb9a1-e7cc-452f-8b0e-1606045139ae" alt="Enhanced by Zemanta" /></a><span class="zem-script more-related"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://regulargeek.com/2010/12/12/happy-3rd-birthday-to-regular-geek/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>A Simple Introduction To Playing With Big Data</title>
		<link>http://regulargeek.com/2010/09/19/a-simple-introduction-to-playing-with-big-data/</link>
		<comments>http://regulargeek.com/2010/09/19/a-simple-introduction-to-playing-with-big-data/#comments</comments>
		<pubDate>Sun, 19 Sep 2010 14:00:13 +0000</pubDate>
		<dc:creator>Rob Diana</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Probability]]></category>
		<category><![CDATA[RDBMS]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://regulargeek.com/?p=2420</guid>
		<description><![CDATA[With social media, big data has come to the forefront of technology. Whether you want to continuously search Twitter, aggregate the social activity on several sites, or do some mining of people&#8217;s activity on Facebook, handling big data is critical. There are two questions you need to answer when looking at a project that will [...]]]></description>
			<content:encoded><![CDATA[<p>With social media, big data has come to the forefront of technology. Whether you want to continuously search <a href="http://twitter.com" target="_blank">Twitter</a>, aggregate the social activity on several sites, or do some mining of people&#8217;s activity on <a class="zem_slink" title="Facebook" rel="homepage" href="http://facebook.com">Facebook</a>, handling big data is critical. There are two questions you need to answer when looking at a project that will handle big data.  First, how is big data defined and when do I know I am dealing with it? The second question is how do I deal with big data?</p>
<h2>How big is big data?</h2>
<p>Big data thresholds change over time. This is really due to how well traditional storage mechanisms can deal with them. Part of the storage problem is hardware related, eg. can the disk store a file larger than 4GB? That question may not be a big deal now, but 15 years ago it was a major concern. Another question about size is, how well an <a class="zem_slink" title="Relational database management system" rel="wikipedia" href="http://en.wikipedia.org/wiki/Relational_database_management_system">RDBMS</a> can store the data? Will the database crash if it tries to manage 100GB of data? Yes, 100GB of data in one database was huge before 2000. Technologies like database partitioning, where a large table was physically split and managed by the database engine, were still young. Now, even open source and free databases have partitioning and replication. The size of big data has increased dramatically as well. When people talk of big data, they mean hundreds of millions of rows in one table and a database potentially over 1TB, yes one terabtye. Even though big data is a hot topic, you have few opportunities to really interact with big data. For our purposes, lets assume you are going to aggregate data from social services in some way, otherwise this post would be fairly short and uninteresting.</p>
<h2>How do you deal with big data?</h2>
<p>One of the first questions when dealing with any database, big data or smaller data, is what are you going to do with it. Is your primary function search of the data? Are you going to try to analyze the data using typical <a class="zem_slink" title="Data mining" rel="wikipedia" href="http://en.wikipedia.org/wiki/Data_mining">data mining</a> techniques? Are you creating more of a <a class="zem_slink" title="FriendFeed" rel="homepage" href="http://friendfeed.com/">FriendFeed</a>-like reading and browsing service? Knowing your target is very important as it will likely change the way your data is stored as well as affect your choice in technologies. One major assumption that I am making is that you do not want to spend money on expensive tools like <a class="zem_slink" title="Oracle Corporation" rel="homepage" href="http://oracle.com">Oracle</a>, SAS or Informatica. So, what kind of tools and technologies do you need to look at?</p>
<h3>Data Storage</h3>
<p>Data storage is possibly the most important decision when dealing with large amounts of data. Traditional RDBMS software can handle huge amounts of data but sometimes require extensive knowledge to manage. <a class="zem_slink" title="MySQL" rel="homepage" href="http://www.mysql.com">MySQL</a> can easily handle many data storage needs and it is well known by many developers. It is the easy choice for many people. However, there are a growing number of <a class="zem_slink" title="NoSQL" rel="wikipedia" href="http://en.wikipedia.org/wiki/NoSQL">NoSQL</a> choices that may also make sense. Some of the NoSQL options have very good text search capabilities, while others have been optimized for speed of reads or writes. Knowing how your application will handle data access helps refine this choice. Also, do not forget about the potential of a mixed environment where some data is in an RDBMS and other data is better suited to a NoSQL datastore. There is a large list of categorized NoSQL options at <a href="http://nosql-database.org/" target="_blank">NoSQL-Databases.org</a>.</p>
<h3>Data Caching</h3>
<p>No matter how well architected your data storage solution is, sometimes reads are just not fast enough. This will typically happen if you have a highly trafficked site, but maybe there is just some data that does not change too frequently. In order to squeeze as much speed as possible out of your application, you probably want some level of data caching in your application. The basic idea is that your data cache is on big hashmap stored in memory which allows extremely fast reads. This is much faster than traditional database access or basic file I/O. If you have paid any attention to web application development over the past several years, you have heard of memcache. Memcache is a data caching server that you can use with your application. This is one option you can take, but some people like to have more control over how data caching works with their application. In that case, you need to find a data caching library for your language of choice. For Java, there are several available, and some have been integrated into web frameworks like Spring. In particular, ehcache has good integration with Spring so you could quickly include data caching in your application.</p>
<h3>Distributed Computing</h3>
<p>If you take the NoSQL route, many of those solutions are meant to be deployed in a distributed environment. In many cases, the software will have an agent running on several servers in order to store some of your data on that server. The master or orchestrator (the terminology could be different) will be configured to know which agent to talk to for the requested data. This is a gross simplification of the process, but it should give you an idea of what to expect. <a href="http://en.wikipedia.org/wiki/Distributed_computing" target="_blank">Distributed computing</a> has various potential issues as well. If one of your servers crashes, or even if you have to perform some maintenance on a server, how do you continue to retrieve the data stored on that server? Is the data replicated to multiple agents in order to provide simple fail-over capabilities? Do you need to provide your own clustering solution to support the data storage? In some cases, you may even feel that the existing software do not provide you with a good enough solution so you need to build your own. Distributed computing is at the center of many solutions when dealing with big data. Knowing more about how things work will give you a better idea of how to architect your solution as well as what failure points may exist.</p>
<h3>Search</h3>
<p>Search is a separate field entirely due to its focus on relevance and speed. Speed is critical in search because nobody wants to wait more than a minute for reasonable results. Thanks to <a class="zem_slink" title="Google" rel="homepage" href="http://google.com">Google</a>, the longer a user waits, the higher the expectations will be. For example, if I wait a minute to get results from a search engine, I would expect that they would be highly relevant to my question. Google&#8217;s focus on speed with good enough results definitely changed how we interact with search. If your application will have significant search requirements, you need to look at your data storage to determine whether search is core to its function or whether you need an external solution. In years past, search was the domain of the RDBMS vendors, but the rise of the internet and Google has changed things. Search is not about finding the structured data in your database, it now looks at anything on the internet. There are various search projects on Apache that deal with various levels of search. <a class="zem_slink" title="Lucene" rel="homepage" href="http://lucene.apache.org">Lucene</a> is the core search engine index software and can be considered a low-level search technology. <a href="http://lucene.apache.org/solr/" target="_blank">Solr</a>, using the Lucene libraries, provides search through web services in order to keep search as a distinct application outside of your application. Solr and Lucene are focused on keyword searches just like most search technology. <a href="http://nutch.apache.org/" target="_blank">Nutch</a>, also built on Lucene, is Apache&#8217;s answer to web crawling, so if you wanted to search the contents of various web pages, this is the solution for you.</p>
<h3>Probability, Statistics and Machine Learning</h3>
<p>If you decide to do any analysis of your data, there is a host of information you may need to review. If you are planning to graph trends or even simply report on your data, then you need a basic understanding of some simple statistics. You do not need a deep understanding, but even gaining knowledge of standard deviations could prove valuable. If you decide that you want to take your trends a step further and look at expected trends or even simple prediction, probability will rear its ugly head. Just like statistics, some simple concepts in probability will go a long way for many web applications. However, there are times when statistics and probability do not give you the results or the functionality that you desire. At that point you will need to delve into the realm of machine learning. This is not an idea that should just be jumped into as machine learning uses some advanced statistics, probability and mathematics to show how things work. In some cases, you may be able to treat things like a black box and implement an algorithm for simple categorization, like <a href="http://en.wikipedia.org/wiki/Naive_bayes" target="_blank">naive bayes</a>, but it may not give you the results you desire. In those cases, you may need to understand more of how these machine learning algorithms work in order to determine what the best approach may be. This may be a difficult area to understand, but you can do some amazing things with machine learning. How cool would it be to personalize your site based on the user&#8217;s past behavior without the user needing to explicitly select categories or keywords?</p>
<h3>Do I need a PhD to do all this?</h3>
<p>Typical databases are easy to work with. You can use a GUI to create a database and some tables. You can write a query to get back information. Big data changes everything and there are a lot of technologies that try to make things easier. Thankfully, you do not need a PhD to work with big data, because many tools and libraries have been created to make these technologies more accessible to the typical developer. Sometimes more advanced knowledge would be helpful, but in many cases you might be able to treat the technologies as a black box, just like your old RDBMS. You might also think that your case is special and nobody has done anything like it before. If you are developing a web application, I highly doubt what you are doing is really unknown. It may not be known to you, but there may be academic papers explaining things or even solutions in an unrelated field. Big data did not start with social media, it really started in financials, pharmaceuticals and health information. So, if you can&#8217;t find something specific to fit your needs, broaden your search and the information is probably out there.</p>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=03ed0479-28c1-46d8-b564-2a01ed3875c8" alt="Enhanced by Zemanta" /></a><span class="zem-script more-related"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://regulargeek.com/2010/09/19/a-simple-introduction-to-playing-with-big-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Helping With Digital Curation</title>
		<link>http://regulargeek.com/2010/07/27/helping-with-digital-curation/</link>
		<comments>http://regulargeek.com/2010/07/27/helping-with-digital-curation/#comments</comments>
		<pubDate>Tue, 27 Jul 2010 17:56:49 +0000</pubDate>
		<dc:creator>Rob Diana</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[digital curation]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[Google Buzz]]></category>
		<category><![CDATA[google reader]]></category>
		<category><![CDATA[rss]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://regulargeek.com/?p=2142</guid>
		<description><![CDATA[I have previously talked about digital curation and that you should follow my shares in Google Reader. The problem is that only some of you use Google Reader, some use Google Buzz and everyone else is using Twitter and Facebook. I do not want to clutter my Twitter stream with my 40 shares per day, [...]]]></description>
			<content:encoded><![CDATA[<p>I have previously talked about <a href="http://regulargeek.com/2009/08/26/rss-human-filters-and-real-time-streams/" target="_blank">digital curation</a> and that you should follow <a href="http://www.google.com/reader/shared/robdiana" target="_blank">my shares in Google Reader</a>. The problem is that only some of you use Google Reader, some use <a class="zem_slink" title="Google Buzz" rel="homepage" href="http://buzz.google.com">Google Buzz</a> and everyone else is using <a class="zem_slink" title="Twitter" rel="homepage" href="http://twitter.com">Twitter</a> and <a class="zem_slink" title="Facebook" rel="homepage" href="http://facebook.com">Facebook</a>. I do not want to clutter my Twitter stream with my 40 shares per day, so I am thinking of taking a slightly different direction. I also did not want to push daily content down <a href="http://feeds.feedburner.com/RegularGeek" target="_blank">my RSS feed</a> without seeing how the readers of this blog felt about the idea.</p>
<p>So, I am thinking of creating a daily post that is the list of articles that I am sharing from Google Reader. This is a typical links post, but there will be somewhere between 30 and 40 links per day. The post will probably be published in the morning around 9AM US-Eastern time. Please vote in the poll below. If you have other comments, feel free to comment on this as well.</p>
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
<div class="zemanta-pixie"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=1d80e7a7-7711-4cc4-9a8e-8d77a3395ff5" alt="Enhanced by Zemanta" /></a><span class="zem-script more-related"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://regulargeek.com/2010/07/27/helping-with-digital-curation/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Regular Geek Gets A Redesign Mostly</title>
		<link>http://regulargeek.com/2010/07/10/regular-geek-gets-a-redesign-mostly/</link>
		<comments>http://regulargeek.com/2010/07/10/regular-geek-gets-a-redesign-mostly/#comments</comments>
		<pubDate>Sat, 10 Jul 2010 12:33:23 +0000</pubDate>
		<dc:creator>Rob Diana</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[blog design]]></category>
		<category><![CDATA[rss]]></category>

		<guid isPermaLink="false">http://regulargeek.com/?p=1985</guid>
		<description><![CDATA[This post is just a quick note to let people know that the blog has visually changed. There is a new layout, not completely different than before, but definitely different. One thing you will notice on the right side are the new social images. There are convenient buttons you can use to follow me on [...]]]></description>
			<content:encoded><![CDATA[<p>This post is just a quick note to let people know that the blog has visually changed. There is a new layout, not completely different than before, but definitely different.</p>
<p>One thing you will notice on the right side are the new social images. There are convenient buttons you can use to follow me on <a href="http://twitter.com/robdiana" target="_blank">Twitter</a>, <a href="http://www.facebook.com/robdiana" target="_blank">Facebook</a> or <a href="http://profiles.google.com/robdiana" target="_blank">Google Buzz</a>. You can still find me on other sites that I am not as active on the About page. There are also big buttons for following the <a href="http://regulargeek.com/" target="_blank">RegularGeek</a> blog on <a href="http://twitter.com/regulargeek" target="_blank">Twitter</a>, via <a href="http://feeds.feedburner.com/RegularGeek" target="_blank">RSS</a> or <a href="http://feedburner.google.com/fb/a/mailverify?uri=RegularGeek" target="_blank">Email</a>. The RegularGeek Twitter account currently has only posts from the blog, but I am looking for interesting ways to use it if you have any ideas.</p>
<p>Another change I wanted to point out was the social sharing options on each post. Beneath the post title you can see four icons that you can use to share RegularGeek posts with <a class="zem_slink" title="Twitter" rel="homepage" href="http://twitter.com">Twitter</a>, <a class="zem_slink" title="Google Buzz" rel="homepage" href="http://buzz.google.com">Google Buzz</a>, <a class="zem_slink" title="Facebook" rel="homepage" href="http://facebook.com">Facebook</a> and <a class="zem_slink" title="Digg" rel="homepage" href="http://digg.com">Digg</a>. These icons also appear at the bottom of each post as well. Feel free to share the posts on other social sites as well.</p>
<p>Over the rest of the summer, other things will be changing as well. I will eventually have a logo, but that is still being worked on. Some of the navigation will change, and the static pages will be changing as well.</p>
<p>So, if you are reading this post in an RSS reader, please take a look at the site to see the new changes. As always, let me know in the comments if you like it or hate it.</p>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=3ae2f5ce-998d-46b4-ac81-dbad05350acf" alt="Enhanced by Zemanta" /></a><span class="zem-script more-related"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://regulargeek.com/2010/07/10/regular-geek-gets-a-redesign-mostly/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>What Interview Questions Are Helpful In Hiring Developers?</title>
		<link>http://regulargeek.com/2010/05/07/what-interview-questions-are-helpful-in-hiring-developers/</link>
		<comments>http://regulargeek.com/2010/05/07/what-interview-questions-are-helpful-in-hiring-developers/#comments</comments>
		<pubDate>Fri, 07 May 2010 17:46:08 +0000</pubDate>
		<dc:creator>Rob Diana</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://regulargeek.com/?p=1712</guid>
		<description><![CDATA[Interviews are always an interesting topic, and for software engineers the interview process can take many forms. First, there is the standard interview where you are asked questions regarding the technologies that the potential opportunity uses. Those types of questions are always useful because you can find out if someone is lying on their resume [...]]]></description>
			<content:encoded><![CDATA[<p>Interviews are always an interesting topic, and for software engineers the interview process can take many forms. First, there is the standard interview where you are asked questions regarding the technologies that the potential opportunity uses. Those types of questions are always useful because you can find out if someone is lying on their resume or just how much they really know about a given topic. On the other end of the spectrum, there are the puzzle interviews that <a class="zem_slink" title="Microsoft" rel="homepage" href="http://www.microsoft.com">Microsoft</a> and <a class="zem_slink" title="Google" rel="homepage" href="http://google.com">Google</a> have become famous for, even though Google no longer uses puzzles. Somewhere in between are those interviews where you get a significantly different programming or design question that is meant to take 15 to 30 minutes of the interview.</p>
<p>I have talked about interviews, job searches and hiring before. In once case, I talked about <a href="http://regulargeek.com/2010/03/19/finding-a-job-that-fits/" target="_blank">finding a job that fits</a> from the perspective of the job searcher. Those same questions can be asked by the interviewer to determine whether the prospective employee would be a good fit for the organization. As I stated in that post, some people can thrive in the small company atmosphere, while others require the structure of larger companies. However, those organizational questions still do not get to the heart of the problem. Is this person someone we want to hire? Will this person be productive in our environment? One thing I should note, if your company only hires due to specific technology needs, then you do not need to read any further. Those interviews typical focus on specific technical knowledge, and the rest of this post is not about that. This post is more about those weird puzzle questions.</p>
<p>Are puzzle interview questions helpful in hiring developers? This is an interesting question because so many companies still use puzzle questions in their interview process. The idea behind these questions is that you are supposed to get an idea of how the person thinks through a problem. However, many puzzle questions do not require the interviewee to think for very long. They either get the answer right away, or they require a few hints. A good example of a puzzle question is <a href="http://www.techinterview.org/puzzles/TheRopeBridge.html" target="_blank">The Rope Bridge</a>. It is one of the more relevant puzzles, but how much does it help you determine whether your interviewee will be a good developer in your organization?</p>
<p>The other side of this are the &#8220;fish out of water&#8221; questions. What I mean is that you have the interviewee explain a system for something they have not thought about before. These questions are more easily applied to application design. For example, one design question I have seen is the &#8220;elevator system&#8221;. Unless they worked in <a class="zem_slink" title="Embedded system" rel="wikipedia" href="http://en.wikipedia.org/wiki/Embedded_system">embedded systems</a>, it is unlikely that they have run into this type of situation before. This type of question gives you an idea of how they think through an unknown problem domain. The only issue with something like the elevator system is that people may have problems determining the objects and behaviors within the system. Sometimes an unknown domain means that there will be more questions than answers.</p>
<p>Finally, there are the programming questions that people use. The utility of these questions vary greatly as the complexity is typically varied as well. You can start with fairly simple questions like &#8220;<a class="zem_slink" title="Bizz buzz" rel="wikipedia" href="http://en.wikipedia.org/wiki/Bizz_buzz">FizzBuzz</a>&#8220;, where you write a function that prints &#8220;fizz&#8221; when a number is divisible by 3, &#8220;buzz&#8221; when the number is divisible by 5 and &#8220;fizzbuzz&#8221; when the number is divisible by both 3 and 5. The solutions is quite simple, but it does tend to stump a lot of people. Personally, I do not like &#8220;FizzBuzz&#8221; because it does not tell you much about a senior level engineer. The problem with simple questions is that interviewees may think they are harder than they really are. Just a few months ago, <a href="http://regulargeek.com/2010/02/23/how-do-you-hire-programmers/" target="_blank">I wrote about hiring programmers</a> and one statement is very relevant to this discussion, &#8220;Most people tend to get very nervous during an interview which makes   even simple programs somewhat difficult to write.&#8221;</p>
<p>So, if simple programs do not tell you enough, you need something a little harder. For some companies, if you pass an initial phone screen they may ask you to complete a programming assignment or it could even be the entrance criteria for the interview process. Recently, I saw <a href="http://java.dzone.com/articles/best-tech-interview-question" target="_blank">one of these questions on JavaLobby</a>:</p>
<blockquote><p>By starting at the top of the triangle and moving to adjacent numbers on the row below, the maximum total from top to bottom is 27.</p>
<p><strong>5<br />
9</strong> 6<br />
4 <strong>6</strong> 8<br />
0 <strong>7</strong> 1 5</p>
<p>I.e. 5 + 9 + 6 + 7 = 27. Write a program in a language of your choice to find the maximum total  from top to bottom in <a href="http://www.yodle.com/puzzles/triangle.txt" target="_blank">triangle.txt</a>, a text file containing a triangle with 100 rows.</p></blockquote>
<p>This is an interesting problem although it may not be entirely relevant to your future job. One issue with this type of programming question is that you do not see the thought process of the interviewee, and the problem could be too difficult for a 30-minute segment of an interview. So, you need to find something that could be completed within 30 minutes and give you a solid feeling for how competent a person is. One question I have run into that fits this nicely and does not require any domain knowledge is the following:</p>
<blockquote><p>You have multiple lists of numbers that may or may not be sorted and need to be sorted into one large list. How do you do this?</p></blockquote>
<p>In my experience, this problem takes anywhere between 15 and 30 minutes and it requires the person to write some code on a whiteboard. It is an interesting question for a few reasons:</p>
<ul>
<li>it is definitely solvable within 30 minutes</li>
<li>it does not require advanced algorithmic knowledge like <a class="zem_slink" title="Artificial intelligence" rel="wikipedia" href="http://en.wikipedia.org/wiki/Artificial_intelligence">AI</a> or <a class="zem_slink" title="Machine learning" rel="wikipedia" href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a></li>
<li>it can require code without being too extensive</li>
<li>it can lead to other talking points like performance and memory consumption.</li>
</ul>
<p>The key thing to remember is that you need to be comfortable working with this person as well as being comfortable with their level of knowledge. Finding the appropriate level of detail and complexity in interview questions is like a dark art. Some people figure it out after years of study and refuse to tell you their secrets.</p>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/add12622-9b9d-4896-bd46-ca84c0b8f7b9/"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_c.png?x-id=add12622-9b9d-4896-bd46-ca84c0b8f7b9" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://regulargeek.com/2010/05/07/what-interview-questions-are-helpful-in-hiring-developers/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.526 seconds -->

