Office, programming

A Ning Network + RSS Feeds + PHP cUrl + MySql + GAE -> WTF?!

We use Ning in our company for the developer community activities around our product.  We wanted to know if there are any topics on forums are unanswered, or what are the topics that were being discussed in the last week, or were any blogs written, and things like that.  To make it simple and easy to understand, something like what happened in the past week / month.

Googled to see if anything as such is available,  as my luck would have, i didn’t find any.  So, i decided to write my own.  Went to Ning API documentation to see if there are any ways to get the data, again a dead end.  It is not available as of now.  But then, i remembered seeing a rss feed of activities in the site, so decided to dig a little more into it.  and land up at this page.  Armed with this piece of info, & since this kind of stuff can’t be done as of now, with our product (if only they provided the data as SOAP service), i sat down to write something to parse the feed, read the data, get the required ones, store it and do the reports.

So sat down to write the feed parser.  decided to use the hosting which i signed up some time back, which i never used, for this purpose. Used PHP cUrl to read the feeds and process it and store it into mysql tables.  everything went well, until they blocked cUrl and socket programming.   I was not in a mood to search for another hosting which supported this.  also, this required me running the php page in a browser so that it can refresh every 15 minutes and get the new data.    So decided to look into something with scheduling and such.

When doing something similar at work, the idea of using Google App Engine for this dawned.  But then App Engine supported only python and Java, which would mean porting from php to any of these languages.  But then, it was Java which gave me a ray of hope.  Since there are many JVM compatible languages around, and if Java was supported, indirectly these languages should also supported.  after a little googling got this information.  And there it was, PHP will work via Quercus.  Digging around their forums, and after looking around, decided to go about doing it.  If it comes up well, then Good, else, it would be one hellava ride, knowing App Engine, Quercus and stuff.

And Challenging it was.  though everything worked without major changes, except storing of data (i was not understanding the examples on how to do it), there was some problems cropping up.  First fetching of atom feeds was not working.  something was a miss.  going around making changes, i couldn’t wat fixed it, after a day it started to work.  But still there was some problem with getting the data, the feed parsing was not returning the feed data for certain tags, turned it did not return the description tag’s data in the rss feed (it was wrapped as CDATA, and was working inconsistently).  Couldn’t figure out why.  and since was pressed for time, decided to fight it another day, and scrapped the required data from another tag’s data.

Next it was Task Queues.  It was not a big difficulty.  actually it was the most easiest, i had a working example.

At this stage, i was at the verge of losing my patience.  It had taken close to 2 weeks, and i was still fighting ( i was on it only on weekends, and you can imagine, how the distractions will be. Cartoons, Movies, Parents, Cooking, Going Out, Cricket, Friends, Twitter, Facebook, Sleeping, Caring for the bike, etc……).  i was pretty much tired of making changes, and just wanted something which will complete the work. Instead of using Data Store for storing the data, and then trying to figure out how to deal with aggregation and joins, decided to go with the mysql in my hosting.  so whipped out a quick script to send data to my hosting from GAE, a php page on my hosting to accept the data via post, validate it and store it into mysql, and another to display the information as  i had wanted it.

At last i have it, something that does what i wanted.  It was a kind of a bit messed up solution, which i had cooked up with all sorts of nonsense ideas ( it seems nonsense when i first thought of ’em ) which i came up.

Now need to, convert it to make use of datastore to store data, which is a trivial task, but then, to do the reports, i think i should change a lot of things, possibly the datamodel structure which i am using now, since datastore does not support joins, it is going to be a challenge for me.  ( i am still the same person, who is not comfortable writing any queries other than select * from table where… and delete from table where….  )

Like i said at the beginning, If it comes up well, then Good, else, it would be one hellava ride.  but frankly, it was both, and boy what a lot of fun it was………  : )

Links where i looked when i was stuck.

The first three are essentially the same…   I would love to share what i had done.  I will, after completing the datastore part of it.