pristine:ecb651ffa1215fba8e39308d2a60740d7096f5971b5e8d89f17616b1c704ce93 Starting with inventory: 0000005914-ed4aecfd889f904b1444d5910e3184c45011920aec2613405edad9f0dcdede30 [TAG 0.2 gwern0@gmail.com**20101130221339 Ignore-this: e0260c0103addd26b6cbac309270d889 ] hash: 0000002732-5582d40e78c8bb4753cc4ae4eaa2b2325186e5e2733f55c5d8718b32d2a88ca0 [archiver.hs: tweak down again for same reason gwern0@gmail.com**20101223023117 Ignore-this: 82da22ef83d3724db046fa72b3dbfdfd ] hash: 0000000308-b4f36c4314f61cd7167674c216825e1b8d13f68b3142a1d5dc4c9c9b3f3103e3 [testing reveals my IP is being banned, so back off a second gwern0@gmail.com**20110109223024 Ignore-this: f195e72206b76e818291e559a2ada213 ] hash: 0000000322-5159c5b323c50c3dd596cd193ed536df71b4f6755b3396a494fc5e5fb2f7d668 [archiver.hs: back off to 25 seconds, currently banned gwern0@gmail.com**20110129165257 Ignore-this: 3f4ac4dbf000b78ad3ceeb83798584ae ] hash: 0000000316-a931f822490a3b34f944730c88d43f1074939abf684b45747b9f50c1d4cb9ebf [archiver.hs: pick random entries rather than first one gwern0@gmail.com**20110405160221 Ignore-this: cc9e6c04f7c22024c45c47d07c621bd0 I noticed that going by alphabetic order was starving the end of the alphabet, and those URLs/domains were just building up endlessly; randomization is the usual solution to such coordination problems, and is easy to implement. ] hash: 0000001841-2ae65d0bbb613fce54cec6b93f94ff52af82e0793aff43a4f07ebddc34200d7d [archiver.cabal; update deps & bump version gwern0@gmail.com**20110405160337 Ignore-this: cdefa74ce820eeb4291071570ef322be ] hash: 0000000546-f50f6afe3880d9f2baa7a53398a182a5805cb11143d9507b886e85463ce814b9 [archiver.cabal: ln to additional documentation gwern0@gmail.com**20110409223715 Ignore-this: 5f3cd529e45ae057634c82b1956aba42 ] hash: 0000000684-61db1d0d1b7b576111b1c3aa3b9eb952fa353db56d92b169224ac062f9004b3f [archiver.hs: only archive & sleep on valid URIs gwern0@gmail.com**20110420210920 Ignore-this: 2f9d8a5471813fcac98b11bd6bf3cb0 This works around the behavior in WebArchiveBot where the addresses of local wiki pages like 'Modafinil#tolerance' are appended to ~/.urls.txt as well as all the http:// and https:// links. ] hash: 0000001127-41a92f80e6c14e96d39ebc367a239ce4d74be8208e0574d12e3b1cafedc4117d [archiver.hs: switch away from nub & sort to the usual Data.Set hack gwern0@gmail.com**20110505004512 Ignore-this: 7c6e1f3255e842eb20ee7bdbba8754b5 the length of the file input recently jumped to 44.5k URLs, at which point the List performance of `nub . sort` starts to become a problem - wasting extra time and using a lot of CPU-time/battery ] hash: 0000000795-3330d48e493b2f00dcf10509ce5e418c5f2cd88a1bde9a5758078d2fe6f0d7c9 [archiver.cabal: minor bump for Data.Set performance increase gwern0@gmail.com**20110517160441 Ignore-this: 309959ef282f12f262342c71a98258d8 ] hash: 0000000221-6a5c1a534f3ad97705c8e57ca8403d038a938d03eb0ed0697ba7e2f791b3717a [archiver.cabal: update URLs gwern0@gmail.com**20110524193808 Ignore-this: 95b115b8cd1d3b0e0368870e41cdd406 ] hash: 0000000498-e335eb09bb08a513f04acbb70f0bff6d98fff8d79d8be960ad4402bfc6b72778 [Archiver.hs: filter each archive's URLs gwern0@gmail.com**20110605182601 Ignore-this: f08f6d32694e00b8e6a9e9960dd6728f There is no point in asking Archive.org to archive an Archive.org address, or asking WebCite to archive a webcitation.org page; it's wasteful. (It makes sense to cross-archive, though, and that's permitted.) ] hash: 0000001634-8629cc31a6df633d1f73ecf7015c2ff415bd1ccbe6697267b52fd4ffd0d9f8c0 [bump version to 0.4 for new hook feature gwern0@gmail.com**20110615164458 Ignore-this: 74e791bfcb74d26a61eb9c4e914f39ce ] hash: 0000000201-5aaa17c038528957fd83830e20fef98a58509117cb89b57000821f333d36e250 [archiver.hs: add in a shell hook gwern0@gmail.com**20110615170712 Ignore-this: 7b6ce939a3b33f5e8704f3291546c1ed idea is that now I can do this: `screen -U -d -m -S "archiver" sh -c 'while true; do archiver ~/.urls.txt gwern0@gmail.com "cd ~/www && wget --continue --page-requisites --timestamping"; done'` While it's running, it incrementally downloads each URL; this relegates `local-archiver` (http://www.gwern.net/Archiving%20URLs#local-caching) to backup status - it should only be a backup to make sure `archiver` didn't miss any URLs. ] hash: 0000004365-4ea53c072b6d4f143434cc86e2f80409edf64f9e518c9c9e9d7f50cc546ae1c9 [+alexaToolbar archiver function; bump to 0.5 gwern0@gmail.com**20110621005101 Ignore-this: 1750e5d65586a0c9c4579534373ac035 ] hash: 0000002606-bcda2cb5a652c5da526e72723e20f11b880924619c93206072fe32a3af1d3c3b [Archiver.hs: slightly amend to follow tutorial gwern0@gmail.com**20110621012226 Ignore-this: 7e6a480bc35708f430c7ce8e5e3e391b ] hash: 0000000603-0645451aa6035bacae9d9e4f8eec4949a1dc9a0b9279a0abda46640be3ad513b [Archiver.hs: speculative new method for request IA archiving gwern0@gmail.com**20110709200756 Ignore-this: c5e34e9f77473c71e39f1a2d218db816 ] hash: 0000000902-98024239f3065503186b1c8da9b29b28485ca9dd1d30382e94ba3dc70f74fbad [archiver.hs: no reason to run the shell command unconditionally gwern0@gmail.com**20110730023801 Ignore-this: 269c38c83c2700e79cc4aed60a01e6ab ] hash: 0000000431-26c3429681ee47134f8cafd6c773a5a923733d9ae92b694ab597f1b666a96fe5 [archiver.hs: try to handle the 0 and 1 URLs cases (ironic I never ran into the base case before...) gwern0@gmail.com**20110810173135 Ignore-this: 2f2ce2349b7b348573a2c2fe6c95be36 ] hash: 0000000716-73eaa379c9cf57f33a8b488b076b5bfe527a2ff1d8ef0911317206f9035c30ba [archiver: wikiwix functionality gwern0@gmail.com**20110909030014 Ignore-this: b578976a700eae696a1f3471b94bc8b8 ] hash: 0000001304-7f22c9464f32b59a70cacf8e364a12c35ca8057a75bbd4455780a70a6f4258ff [Archiver.hs: toss in a random google search gwern0@gmail.com**20111006220955 Ignore-this: ecb441f7cc22b2b4cff809a4544e5e60 ] hash: 0000001377-6ee26b3c294184d75388170374d075bd5fd7d1d1b890ebe9d9379ddc610535e3 [archiver.hs: hold onto shell handle and kill it after the 28s delay - more than enough time to download gwern0@gmail.com**20120216013823 Ignore-this: 57553fb4d2980a41a06fca93e1e251c0 ] hash: 0000001522-09129349ff29c7ed5bd2e44975a0a14d736170e7255dbbd430b51ca504942b1b [Network.URL.Archiver: rm Alexa form because that page is dead and there is no replacement gwern0@gmail.com**20120307171316 Ignore-this: ef8190091059ca4d26a477fb301b41a5 ] hash: 0000003848-f2943892d7b814d557f33261e2d01bd43cfb3461df274e1c79fb528882345af7 [bump to 0.6: removing an alexa function is a major change gwern0@gmail.com**20120307171441 Ignore-this: f177196e948e4ed9116e880383053778 ] hash: 0000000220-caa4bf5517adbc07adbebde077da121461109f8092826ef8e503270b52d54b27 [+configurable timeout gwern0@gmail.com**20120715024702 Ignore-this: b10ad14dcbca2c35d0900d02be172f3f I should have made it configurable a long time ago instead of just tweaking the hardwired constant. ] hash: 0000004965-70e0c775f0885800146cc790dcb4d8568a89a93c0909e24f416197c6615a7863 [archiver.hs: threadDelay configuration was incomplete; clean up bytestring processing too gwern0@gmail.com**20130201232758 Ignore-this: 203fb174d6cbef83aa78d904e82fd4c0 ] hash: 0000003095-7e66dbabfc06bc87cb74e39177f9db3bf546805ce808f4292793fd0f7a076ed5 [bump gwern0@gmail.com**20130201232826 Ignore-this: e24089627719cca36e9c17dcd863ee34 ] hash: 0000000167-3866727cac256494b60cea3f2e93e122a8e8caba1256e1bba79c1e1825036100