{"id":2287,"date":"2010-12-04T16:38:38","date_gmt":"2010-12-05T00:38:38","guid":{"rendered":"http:\/\/www.chesnok.com\/daily\/?p=2287"},"modified":"2010-12-04T21:41:57","modified_gmt":"2010-12-05T05:41:57","slug":"open-data-hackathon-day-oregon-business-license-registry","status":"publish","type":"post","link":"https:\/\/www.chesnok.com\/daily\/2010\/12\/04\/open-data-hackathon-day-oregon-business-license-registry\/","title":{"rendered":"Open Data Hackathon Day: Oregon Business License Registry"},"content":{"rendered":"<p><a href=\"http:\/\/www.chesnok.com\/daily\/wp-content\/uploads\/2010\/12\/IMG_0213.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.chesnok.com\/daily\/wp-content\/uploads\/2010\/12\/IMG_0213-300x168.jpg\" alt=\"\" title=\"IMG_0213\" width=\"300\" height=\"168\" class=\"aligncenter size-medium wp-image-2320\" srcset=\"https:\/\/www.chesnok.com\/daily\/wp-content\/uploads\/2010\/12\/IMG_0213-300x168.jpg 300w, https:\/\/www.chesnok.com\/daily\/wp-content\/uploads\/2010\/12\/IMG_0213-1024x575.jpg 1024w, https:\/\/www.chesnok.com\/daily\/wp-content\/uploads\/2010\/12\/IMG_0213.jpg 1280w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>At the <a href=\"http:\/\/www.chesnok.com\/daily\/2010\/12\/02\/pdx11-the-software-summit-wrapup\/\">Portland Software Summit on Thursday<\/a>, a couple people mentioned that it was hard to keep track of new businesses that pop up, and that merger and acquisition activity wasn&#8217;t being sufficiently publicised. <\/p>\n<p>I thought &#8211; maybe we could get this information in an automated way!<\/p>\n<p>I started with the <a href=\"http:\/\/egov.sos.state.or.us\/br\/pkg_web_name_srch_inq.login\">state of Oregon&#8217;s business registry search site<\/a>. Unfortunately, they limit search results for business searches to 1000, and they don&#8217;t paginate their results. So, we kicked ScraperWiki into gear, and wrote a very simple scraper with @<a href=\"http:\/\/twitter.com\/maxogden\">maxogden<\/a>: <a href=\"http:\/\/scraperwiki.com\/scrapers\/oregon_business_registry\/\">http:\/\/scraperwiki.com\/scrapers\/oregon_business_registry\/<\/a><\/p>\n<p>Next, I wanted to find out information about businesses specifically in Portland. The City releases information about this, but in PDF form: <a href=\"http:\/\/www.portlandonline.com\/omf\/index.cfm?c=32192\">http:\/\/www.portlandonline.com\/omf\/index.cfm?c=32192<\/a><\/p>\n<p>I wrote a quick and dirty Python script to scrape out information, and am getting probably 250 of the 300+ businesses in the November release. Next, I want to cross reference this data with what&#8217;s in the Oregon site. I&#8217;ll be publishing the Python scripts over the weekend.  Hopefully ScraperWiki will add pyPDF to their Python repo support and I will be able to publish the transform there so it can be easily linked to the Oregon data.<\/p>\n<p>Two lessons today: <\/p>\n<ul>\n<li>Governments: Please don&#8217;t publish data in PDFs. YUCK.<\/li>\n<li>Governments: Please paginate results from your site! Hard limits are just kinda lame.<\/li>\n<\/ul>\n<p>The alternative to scraping the state of Oregon&#8217;s site is to order a CD-ROM for $50. I think this is such a stupid profit center for the state. I&#8217;d be interested to know how much money they&#8217;re really making off of it, and whether they could take a page out of Metro&#8217;s book and find a way to share the data with a different, more useful service.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At the Portland Software Summit on Thursday, a couple people mentioned that it was hard to keep track of new businesses that pop up, and that merger and acquisition activity wasn&#8217;t being sufficiently publicised. I thought &#8211; maybe we could &hellip; <a href=\"https:\/\/www.chesnok.com\/daily\/2010\/12\/04\/open-data-hackathon-day-oregon-business-license-registry\/\">Continue reading &rarr;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[387,386,149,388],"class_list":["post-2287","post","type-post","status-publish","format-standard","hentry","category-portland","tag-odhd","tag-open-data-hackathon-day","tag-pdx","tag-scraperwiki"],"_links":{"self":[{"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/posts\/2287","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/comments?post=2287"}],"version-history":[{"count":5,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/posts\/2287\/revisions"}],"predecessor-version":[{"id":2334,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/posts\/2287\/revisions\/2334"}],"wp:attachment":[{"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/media?parent=2287"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/categories?post=2287"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/tags?post=2287"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}