Hi,

Was curious if folks with experience with this sort of thing could suggest a good library for scraping the data from this database of books? http://bit.ly/i8bvJS

Goal is to pull down the book title, author and 'Guided Reading' level.

Python or R libraries would be ideal, as those are the languages/platforms I use the most.

Based on the structure of that particular page I would guess that some tools would be better/easier than others, so would love any advice that will help me get pointed in the right direction!

asked 02 Mar '11, 22:13

almartin's gravatar image

almartin
6113
accept rate: 0%


There is a service called ScraperWiki which allows users to create or even request scrapers to be created by volunteers.

It currently provides the ability to create scrapers using the Python, Ruby and PHP languages as well as providing a API

link

answered 03 Mar '11, 01:52

naesk's gravatar image

naesk
176126
accept rate: 20%

Also see answers to the GetTheData question What tools or services are good for scraping data from websites?

link

answered 03 Mar '11, 16:45

psychemedia's gravatar image

psychemedia ♦♦
1.1k323961
accept rate: 11%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×2
×2
×1

Asked: 02 Mar '11, 22:13

Seen: 862 times

Last updated: 03 Mar '11, 16:45

powered by OSQA