[Document & Workflow] https と basic 認証を越えてページを取得・キャプチャする

公開日: 2015/04/27
更新日: 2018/10/24

ドキュメント作成で大量のスクリーンショットを取る必要があり、これを自動化する。

[markdown]
以前、このあたりでキャプチャ自体は取得できるようになっていたが、basic 認証でこけたので修正した。

> * [capybara-webkit で、コマンドラインからWebページのスクリーンショットをとる | deadwood](https://www.d-wood.com/blog/2013/12/29_5178.html)
> * [webkit2png でコマンドラインからWebページのスクリーンショットをとる | deadwood](https://www.d-wood.com/blog/2013/12/29_5180.html)
> * [wkhtmltopdf | deadwood](https://www.d-wood.com/blog/2015/04/15_7496.html)

あわせて前処理として、HTMLの取得とパースを行えるようにした。

> * [open-uri で https と basic 認証を越えてページを取得する | deadwood](https://www.d-wood.com/blog/2015/04/09_7468.html)
> * [open-uri で utf-8 以外のページを文字変換して取得する | deadwood](https://www.d-wood.com/blog/2015/04/09_7471.html)

## つかいかた

こちらにまとめた。

> * [DriftwoodJP/tools-screenshot](https://github.com/DriftwoodJP/tools-screenshot)

[webkit2png](https://github.com/paulhammond/webkit2png/) でキャプチャするため、事前にインストール。

“`prettyprinted
% brew info webkit2png
webkit2png: stable 0.7
http://www.paulhammond.org/webkit2png/
/usr/local/Cellar/webkit2png/0.7 (3 files, 28K) *
Built from source
From: https://github.com/Homebrew/homebrew/blob/master/Library/Formula/webkit2png.rb
“`

種となるURIからHTMLをパースし、キャプチャするURIリストを作成する。

“`prettyprinted
% ruby create_list.rb -i input.txt
“`

リストをもとにキャプチャする。

“`prettyprinted
% ruby capture_screen.rb output.txt
“`

融通が利くようにステップを分けた。
目的は果たせそう。

## 補遺

nokogiri でパースする際の指定を `doc.css(‘a’)` のようにしていたが、以下のエラーが出た。

“`prettyprinted
…/uri/generic.rb:1203:in `rescue in merge’: bad argument (expected URI object or URI string) (ArgumentError)
“`

[HTML a name Attribute](http://www.w3schools.com/tags/att_a_name.asp) でこけてしまった。
`doc.css(‘a[href]’)`という指定が必要でした。

> * [Get link and href text from html doc with Nokogiri & Ruby? – Stack Overflow](http://stackoverflow.com/questions/9336039/get-link-and-href-text-from-html-doc-with-nokogiri-ruby)
[/markdown]