Web Scraping Fridays: Difference between revisions
From HacDC Wiki
Mike Chelen (talk | contribs) (→Examples: foo) |
Julialongtin (talk | contribs) (another example) |
||
Line 8: | Line 8: | ||
<code> | <code> | ||
for each in `cat ../search\?q\=%7B%22source%22%3A%22legislation%22%2C%22congress%22%3A%22114%22%7D\&pageSize\=250 | sed -n "s/.*H\.R\.\([0-9]\{1,\}\).*/\1/p"`; do { wget https://www.congress.gov/bill/114th-congress/house-bill/$each; } done; | for each in `cat ../search\?q\=%7B%22source%22%3A%22legislation%22%2C%22congress%22%3A%22114%22%7D\&pageSize\=250 | sed -n "s/.*H\.R\.\([0-9]\{1,\}\).*/\1/p"`; do { wget https://www.congress.gov/bill/114th-congress/house-bill/$each; } done; | ||
cat rh\?format\=txt | sed "/.*pre>/{s/.*\(pre>\)/FOUNDME:\1/;:a;N;s/^\(FOUNDME:pre>\n\(\|FOUNDME:.*\n\)\{0,\}\)\(.*\)/\1FOUNDMORE:\3/;;s/FOUNDMORE/FOUNDME/;ta;:b}"| grep FOUNDME | sed "/.*div>/{s/.*\(div>\)//;:a;N;s/.*//;;s/FOUNDME//;ta;}" | less | |||
</code> | </code> | ||
= Resources = | = Resources = |
Revision as of 01:28, 18 July 2015
CFAA
Tools
Picking Victims
Examples
for each in `cat ../search\?q\=%7B%22source%22%3A%22legislation%22%2C%22congress%22%3A%22114%22%7D\&pageSize\=250 | sed -n "s/.*H\.R\.\([0-9]\{1,\}\).*/\1/p"`; do { wget https://www.congress.gov/bill/114th-congress/house-bill/$each; } done;
cat rh\?format\=txt | sed "/.*pre>/{s/.*\(pre>\)/FOUNDME:\1/;:a;N;s/^\(FOUNDME:pre>\n\(\|FOUNDME:.*\n\)\{0,\}\)\(.*\)/\1FOUNDMORE:\3/;;s/FOUNDMORE/FOUNDME/;ta;:b}"| grep FOUNDME | sed "/.*div>/{s/.*\(div>\)//;:a;N;s/.*//;;s/FOUNDME//;ta;}" | less