Started July 2008, Ended June 2009
This is another product I created for Prime Vendor in Wilmington, NC in order to collect all the bid information offered by any government agency in the United States.
The core problem here was that there’s somewhere around hundreds of thousands of organizations in the United States government that post bids for private contractors to, well, bid upon. Many of these organizations post multiple new bids every day – some of which are streamed out of a database but, surprisingly, lots of these government agencies still relied on human beings to update their tables of offers. When your job is to create a network of ‘spiders’ that go to these pages and are just supposed to check if anything has changed since their last visit and, if so, download the new data and submit it as a new bid.. well, there are often complications.
Under this system handling any number of pages of any kind of complexity with or without login screens became trivial – writing a spider for a new government site became a task consuming only 5-15 minutes. Over the course of the year that I worked there, the shear number of sites we were actively spidering grew to such a large number that it was too much for one machine to handle executing them each in serial. The first solution to this problem was to add in a multi-threaded download manager to the browser where the spiders could parse and traverse these pages as quickly as they could and delegate the actual downloading of bids to this background worker. It did not take long for this to beprohibitiveas well and the need for a complete distributed spider network that spanned several machines became apparent.
Subsequently, that’s exactly what I created next – an entire network of spiders all working independently constantly monitoring government agencies for new contract bids and downloading them the instant they became available. The bids were then streamed in real-time to all of our clients for them to bid upon the ones of their choice.
As an extension of this we also had an internal website of all the contractors in the United States that our sales team would use in order to produce new leads and sell more software, but most of the list lacked any contact information. To facilitate this I added in another button on the side that, when clicked, would spin up a spider on one of the servers in the back and google the company with some extra keywords to hone in on the company and throw out invalid results. The spider would then return back a ranked list of the top 5 results, of which usually the first was the correct one, and allow the user to select which was correct. This would then update the web-site address of the company listing and, with a couple of clicks, they’d also have a phone number and e-mail address if the company posted it on their site.
I left prime-vendor in June of 2009, at which point this project was taken over by a young developer I interviewed for my position.