Web Scraping Fridays: Difference between revisions
From HacDC Wiki
Julialongtin (talk | contribs) (another example) |
Mike Chelen (talk | contribs) (→Schedule: .) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Schedule = | |||
* Aug 14 | |||
* Aug 28 | |||
* Sep 11 | |||
* Sep 25 | |||
= CFAA = | = CFAA = | ||
Line 9: | Line 16: | ||
wget https://www.congress.gov/search?q={%22source%22%3A%22legislation%22%2C%22congress%22%3A%22114%22}&pageSize=250 | wget https://www.congress.gov/search?q={%22source%22%3A%22legislation%22%2C%22congress%22%3A%22114%22}&pageSize=250 | ||
for each in `cat | for each in `cat ./search\?q\=%7B%22source%22%3A%22legislation%22%2C%22congress%22%3A%22114%22%7D\&pageSize\=250 | sed -n "s/.*H\.R\.\([0-9]\{1,\}\).*/\1/p"`; do { wget https://www.congress.gov/bill/114th-congress/house-bill/$each/text?format=txt; } done; | ||
cat rh\?format\=txt | sed "/.*pre>/{s/.*\(pre>\)/FOUNDME:\1/;:a;N;s/^\(FOUNDME:pre>\n\(\|FOUNDME:.*\n\)\{0,\}\)\(.*\)/\1FOUNDMORE:\3/;;s/FOUNDMORE/FOUNDME/;ta}"| grep FOUNDME | sed "/.*div>/{s/.*\(div>\)//;:a;N;s/.*//;;s/FOUNDME//;ta;}" | less | cat rh\?format\=txt | sed "/.*pre>/{s/.*\(pre>\)/FOUNDME:\1/;:a;N;s/^\(FOUNDME:pre>\n\(\|FOUNDME:.*\n\)\{0,\}\)\(.*\)/\1FOUNDMORE:\3/;;s/FOUNDMORE/FOUNDME/;ta}"| grep FOUNDME | sed "/.*div>/{s/.*\(div>\)//;:a;N;s/.*//;;s/FOUNDME//;ta;}" | less |
Latest revision as of 23:50, 11 August 2015
Schedule
- Aug 14
- Aug 28
- Sep 11
- Sep 25
CFAA
Tools
Picking Victims
Examples
for each in `cat ./search\?q\=%7B%22source%22%3A%22legislation%22%2C%22congress%22%3A%22114%22%7D\&pageSize\=250 | sed -n "s/.*H\.R\.\([0-9]\{1,\}\).*/\1/p"`; do { wget https://www.congress.gov/bill/114th-congress/house-bill/$each/text?format=txt; } done;
cat rh\?format\=txt | sed "/.*pre>/{s/.*\(pre>\)/FOUNDME:\1/;:a;N;s/^\(FOUNDME:pre>\n\(\|FOUNDME:.*\n\)\{0,\}\)\(.*\)/\1FOUNDMORE:\3/;;s/FOUNDMORE/FOUNDME/;ta}"| grep FOUNDME | sed "/.*div>/{s/.*\(div>\)//;:a;N;s/.*//;;s/FOUNDME//;ta;}" | less
cat rh\?format\=txt | sed "/.*pre>/{s/.*\(pre>\)/FOUNDME:\1/;:a;N;s/^\(FOUNDME:pre>\n\(\|FOUNDME:.*\n\)\{0,\}\)\(.*\)/\1FOUNDME:\3/;ba}"| sed -n "s/FOUNDME://p" | sed "/.*div>/{s/.*\(div>\)//;:a;N;s/.*//;ba;}" | less