Table of Contents
- [[related]]
-
[* motivation](#mtvtn TIDDLYLINK)
[2020-03-17]
"GitHub blocked me and all my libraries" https://news.ycombinator.com/item?id=22593595- [[motivation for using gdpr/takeouts: convenient when you're migrating off the service? don't have to worry about regular exports]]
[2020-05-31]
Own your content on Social Media using the IndieWeb - YouTube [[dataliberation]]
-
[* implementation: goals/tips/practices](#mplmnttnglstpsprctcs TIDDLYLINK)
-
[2019-10-03]
another big goal is having little operational overhead. I'd rather set up a (potentially elaborate) system once and tthen never have to update it and think how it works [[exports]] [[infra]] [2021-03-04]
Importance of agnostic exports: ofter you start backing up before you process the data- [[recommend checking the database to make sure it's got specific things you need]] [[backupchecker]]
- [[synthetic style exports allow for defensive error handling – you can at least get data from the last state]] [[errors]]
- [[eh. maybe get rid of colored logs for export process? presumably no one would look at them often]]
- [[use submodule for common files, but release as a standalone package? I guess it's the best of both]]
[2020-01-01]
ChromeDevTools/devtools-protocol: Chrome DevTools Protocol [[exports]] [[scrape]][2020-04-19]
open files using utf-8 encoding (fixes #5) by miguelrochefort · Pull Request #6 · karlicoss/rexport- [[backup-wrapper is a more generic tool… basically running arb command and saving output with pattern]]
- [[thinking about data providers]] [[dataliberation]]
- [[dashboard for tokens + expose json or something so any language can have bindings]] [[infra]] [[exports]]
- [[ok, exposing a stream is sort of good? and then filtering? makes it easier to use synthetic exports]] [[hpi]] [[exports]] [[dal]]
-
-
[* twitter](#twttr TIDDLYLINK) [[twitter]]
- [[hmm. links that you get through search or API are shortened?]] [[linkrot]] [[twitter]] [[twint]]
[2021-01-19]
bisguzar/twitter-scraper: Scrape the Twitter Frontend API without authentication. [[twitter]] [[exports]]- [[twint itself should work as incremental export… and then DAL should combine]] [[twint]]
-
[2019-07-28]
jonbakerfish/TweetScraper: TweetScraper is a simple crawler/spider for Twitter Search without using API [[twitter]] - [[talon databases (lots of them!)]] [[hpi]] [[android]]
-
[2019-07-29]
taspinar/twitterscraper: Scrape Twitter for Tweets - [[err, new twitter exports are half gig each?]]
- [[twint: possibly missing reply things (with 'at')]] [[twint]] [[hpi]]
-
[* hackernews](#hckrnws TIDDLYLINK) [[hackernews]]
[2020-04-05]
Our plan is for the next version of HN's API to simply serve a JSON version of e… | Hacker News [[hackernews]][2020-04-07]
Profile: karlicoss | Hacker News[2020-04-29]
need to mirror HN… [[hackernews]] [[exports]]- [[materialistic – 'read' table]] [[promnesia]]
[2021-03-05]
it's impresive that pretty much every tool for exporting has some flaws [[hackernews]]- [[HN data provider]] [[hpi]] [[orger]] [[promnesia]]
-
[* google takeout/other google data](#ggltktthrggldt TIDDLYLINK) [[takeout]]
- [[wonder if it's possible to get watch position?]] [[takeout]] [[youtube]]
- [[automating login & downloading]]
[2021-01-10]
Hypothesis [[takeout]]- [[could sync mini-takeouts? with only necessary stuff picked from them]] [[takeout]]
- [[youtube watch history – should be accumulated from multiple takeouts]] [[youtube]]
[2019-06-11]
eh, recompressing to .tar.xz only saved 100 mb [[takeout]]- [[ugh, also when it's too large, they split archive in two]] [[takeout]]
- [[also disappearing Disover/Myacvitiy??]] [[takeout]]
[2020-04-23]
I've found Google Takeouts to silently remove old data | beepb00p[2020-04-24]
Takeout/My Activity/Search data is limited to last 10 years. Please remove limit - Google Search Community[2020-04-29]
> I’ve already pulled down my 2-300GB Google Photos archive How? I've tried sev… | Hacker News[2020-05-04]
I replied to a similar point about hashing here - https://news.ycombinator.com/i… | Hacker News[2020-01-01]
perkeep/gphotos-cdp: This program uses the Chrome DevTools Protocol to drive a Chrome session that downloads your photos stored in Google Photos. [[scrape]][2019-06-28]
After hoarding over 50k YouTube videos, here is the youtube-dl command I settled on. : DataHoarder[2020-01-01]
perkeep/gphotos-cdp: This program uses the Chrome DevTools Protocol to drive a Chrome session that downloads your photos stored in Google Photos. https://github.com/perkeep/gphotos-cdp
-
[* emfit sleep tracker](#mftslptrckr TIDDLYLINK) [[emfit]]
-
[2018-08-18]
Emfit has local API; would be nice to use it… [[emfit]] [2019-12-17]
downloadEmfitAPI.py https://gist.github.com/vanne02135/6901cc2b92315881080d0ce0f07c1a17- [[ugh. maybe autorefresh the token? Fuckig hell.]] [[emfit]]
- [[hmm, with emfit can code some sort of feedback tool which signals me to move when emfit loses signal]] [[emfit]]
[2020-05-29]
emfit API didn't work for about three days straight… [[emfit]] [[backup]][2019-12-21]
samuelmr/emfit-qs: Unofficial Node client for Emfit QS
-
-
[* bluemaestro temperature sensor](#blmstrtmprtrsnsr TIDDLYLINK) [[bluemaestro]]
- [[figure out bluemaestro, make sure all merged]] [[bluemaestro]]
- [[actually wonder if I can connect it to computer?]] [[bluemaestro]]
- [[merge bluemaestros, plot separate environmental dashboard?]] [[dashboard]]
- [[automate, about how I back up bluemaestro data]] [[toblog]]
[2019-09-29]
yeah, could elaborate on backing up android data, could be quite generic? [[android]]
-
[* reddit](#rddt TIDDLYLINK) [[reddit]]
- [[I think cool fact should just be converted into org mode from backups (merged!) but generally there is no point capturing them?]] [[reddit]]
- [[I guess just rely on bleanser instead after all? Just make it less spammy]] [[bleanser]] [[reddit]]
- [[Check for deleted favorites]] [[reddit]]
- [[shit. need to bleanse reddit properly, otherwise looks like it's too much data…]] [[reddit]]
- [[basically, just go through stuff that doesn't exist anymore but was in favorites ever (and suppress errors for some of them)]] [[reddit]]
[2020-01-11]
Getting Started — PRAW 3.6.0 documentation [[reddit]]
-
[* browser history](#brwsrhstry TIDDLYLINK)
- [[compress databases as xz? would same about half of space at least, even more on firefox databases]] [[promnesia]]
- [[cleanup firefox phone exports…]]
- [[firefox history – db format has changed??]] [[hpi]] [[infra]]
- [[firefox history – could compress with zstd? seems like 30x compression]] [[promnesia]]
- [[firefox dev history]] [[phone]]
[2020-08-29]
seanbreckenridge/ffexport: export and interface with firefox history/visits and site metadata
- [* hypothesis](#hypthss TIDDLYLINK) [[hypothesis]]
-
[* github](#gthb TIDDLYLINK) [[github]]
[2020-02-01]
motivation for github backups [[exports]]- [[warn about large repos?]] [[github]]
- [[ghexport – read times out]] [[ghexport]]
- [[500 error]] [[ghexport]]
- [[backport old github backups to new format? should be enough to just wrap in 'events']] [[backup]] [[timeline]] [[promnesia]]
- [[github – starred repos aren't updated??]]
- [* whatsapp](#whtspp TIDDLYLINK) [[whatsapp]]
-
[* stackexchange](#stckxchng TIDDLYLINK) [[stackexchange]]
-
[2019-09-01]
Usage of /users/{ids}/favorites GET - Stack Exchange API [[promnesia]][2019-09-16]
shit. seems that no way to get upvoted posts… https://meta.stackexchange.com/questions/299264/how-to-get-the-list-of-all-posts-ive-upvoted-via-the-api[2019-09-16]
https://meta.stackexchange.com/questions/148008/how-can-i-see-comments-that-ive-upvoted[2019-09-16]
fuck. I guess I'm gonna have to scrape votes… https://stackoverflow.com/users/706389/karlicoss?tab=votes
- [[stackexchange – there are comments in GDPR requested data]] [[stackexchange]]
- [[stackexchange – shit]]
- [[stackexchange – need to figure out how to import remaining data…]]
- [[Today I would probably have tried parsing the Stack Exchange Data Dump instead.]]
- [[hmm crashed on json decoding?]] [[stexport]]
-
-
[* mastodon](#mstdn TIDDLYLINK) [[mastodon]]
[2020-01-11]
kensanata/mastodon-backup: Archive your statuses, favorites and media using the Mastodon API (i.e. login required)- [[zigg/grabby: tools for scraping your Mastodon account data]] [[mastodon]]
[2019-12-29]
halcy/Mastodon.py: Python wrapper for the Mastodon ( https://github.com/tootsuite/mastodon/ ) API. [[mastodon]]- [[tusky android app keeps some history in tuskyDb]] [[hpi]] [[mastodon]]
-
[* pinboard](#pnbrd TIDDLYLINK) [[pinboard]]
- [[huh looks like pinboard is quite unstable with regards to backup… unless the backup script is wrong or something?]] [[bleanser]]
[2019-04-19]
Pinboard on Twitter: "Next question is, does a raw API call give the same results as the website? The API and website search engine run off of different indexes.… https://t.co/CZrLE7YNWo" [[pinboard]]
- [[-–— other data sources ---]]
- [[Podcast addict data]]
[2020-07-31]
alexattia/Maps-Location-History: Get, Concatenate and Process you location history from Google Maps TimeLine [[location]] [[timeline]] [[qs]]- [[ok, so need to preserve all (incl.older) versions of notebooks? dunno feels a bit excessive]] [[timeline]] [[remarkable]]
[2020-10-25]
Garmin Connect [[garmin]][2020-12-30]
Notice: This project is unmaintained · Issue #613 · fbchat-dev/fbchat [[facebook]]- [[Need my email mirrored]] [[email]]
[2019-06-13]
joeyates/imap-backup: Backup GMail (or other IMAP) accounts to disk [[email]]-
[[Bandcamp history]]
[2020-12-13]
https://bandcamp.com/developer no listening history though…
- [[hmm memrise personal data request is neat! It's got all you training sessions + learned words and phrases]] [[publish]]
- [[do a full remarkable backup too?]] [[remarkable]]
- [[better docs on what to do on expiry]] [[monzo]]
- [[huh, thriva uses an api…]]
- [[call history from my old(er?) phones? (e.g. nokia)]]
- [[increase sample rate to 10 seconds maybe?]] [[arbtt]]
- [[process old 'backups' repo?]]
- [[reading hr data]] [[wahoo]]
[2019-04-08]
python - Steam API get historical player count of specific game - Stack Overflow- [[Feedbin starred stuff]]
[2019-07-14]
fabianonline/telegrambackup: Java app to download all your telegram data.- [[eh, should include older account? compare oldest and one of newer files..]] [[monzo]]
- [[myshows: hmm, so looks like api v 1.8 is deprecated, for api 2.0 I'd need to email them. can just use raw jsons from existing backup script]]
- [[compress chrome histories? would require backup script to compress it I suppose… maybe just go through them regularly and recompress]]
- [[bookmarks limit through api???]] [[instapaper]]
- [[gpslogger – add to backup checker??]] [[location]]
[2020-10-03]
Statify: Pull your playlist and listening data from the Spotify API to a Sqlite database /r/coolgithubprojects- [[monzo export: make sure it works with original repo..]] [[exports]] [[monzo]]
- [[ugh, need to retrieve pinboard notes]] [[pinboard]] [[exports]]
[2019-04-23]
feedbin/feedbin-api: Feedbin API Documentation [[feedbin]][2020-11-27]
Success Stories · tcgoetz/GarminDB Wiki [[garmin]][2020-12-19]
Importing your Goodreads & Accessing them with Open Library’s APIs[2020-06-24]
Telegram Now Lets You Export Your Chats, View Notification Exceptions | Technology News [[telegram]]- [[get off the messages stored in old format and make sure nothing is missing, dedup?]] [[vk]]
- [[shit, they stopped you from accessing messages api. fuck.]] [[vk]]
- [[Headspace stats]] [[timeline]]
- [[.polar directory]] [[timeline]]
[2019-09-02]
vincaslt/memparse: A Memrise courses parser https://github.com/vincaslt/memparse- [[skype call history?]]
- [[amazon orders history]]
- [[ugh, bookmarks method in api is not exhaustive (elif item.get("type") == 'bookmark')]] [[instapaper]]
[2019-04-01]
Polar AccessLink Api Daily Activity Goal /r/Polarfitness- [[just reuse files dir? def no harm in it]] [[telegram]]
- [[blinkist: scrape off my highlights]]
- [[export bitbucket]]
- [[feedbin]]
[2020-03-05]
signalnerve/roam-backup: Automated Roam Research backups using GitHub Actions and AWS S3- [[----]]
[2020-02-03]
Data lake - Wikipedia [[dal]] [[exports]][2020-04-21]
fucking hell. so materialistic export stopped working [[phone]]- [[start awesome-exports list?]] [[exports]] [[publish]]
- [[script to grab files from downloads and move accodingly? e.g. for oyster statements]]
- TW at
[2017-01-21]
Играюсь с IMDB, думал придется beautiful soap доставать айтемы из вотчлиста, а там в стейте реакта лежит JSONка [[exports]] - [[Post about various ways of data handling]] [[toblog]] [[dataliberation]]
[2019-12-27]
'hostage model' is a good term [[toblog]] [[dataliberation]] [[sadinfra]][2020-01-15]
Hi, Camlistore author here. Andrew Gerrand worked with me on Camlistore too and… | Hacker News [[infra]] [[exports]]- [[automatic date extraction? could work, e.g. for rescuetime]] [[datetime]] [[backupchecker]]
- –— last housekeeping on
[2021-02-06]
-- [2020-04-13]
twintproject/twint: An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.[2020-04-23]
MatthieuBizien/roam-to-git: Automatic RoamResearch backup to Git[2020-04-28]
timgrossmann/InstaPy: 📷 Instagram Bot - Tool for automated Instagram interactions- [[right, so if you enable sync it seems to suck in history on the phone database? eh. messy]] [[firefox]] [[exports]] [[promnesia]]
- [[crap… android database has really high granulatity of events??]] [[rescuetime]]
[2021-02-05]
Chiaki/VKBK: Инструмент для создания и синхронизации локального бэкапа вашего профиля ВКонтакте (Profile backup & synchronization tool for Vk.com) [[vk]] [[exports]][2021-02-07]
Against developer terms of service? · Issue #171 · Tyrrrz/DiscordChatExporter- [[make it kinda smarter?]] [[backupchecker]]
- [[hmm, takeout has all tcx files?]] [[endomondo]]
- [[hide praw logs unless interactive? too spammy in syslog]] [[infra]]
- [[hm nice podcast addict simply backs up its database]] [[exports]]
- [[Hmm maybe need to check for similar dst problems… Basically mismatch between hr and sleep start/end?]] [[emfit]]
[2021-02-25]
ryanmcgrath/twython: Actively maintained, pure Python wrapper for the Twitter API. Supports both normal and streaming Twitter APIs. [[python]] [[twitter]][2021-02-04]
Privacy Policy - October 15, 2020 - Reddit [[reddit]] [[exports]][2021-02-05]
Rapptz/discord.py: An API wrapper for Discord written in Python. [[discord]] [[exports]][2021-02-08]
Oura ring vs. Emfit QS (My detailed comparison) - What do you think? - Quantified Self / Sports, Physical Activity, and Fitness - Quantified Self Forum [[emfit]] [[exports]]- [[list the takeouts that are reduntant]] [[takeout]] [[promnesia]]
- [[runnerup database file? could use existing computations perhaps?]]
- [[maybe for DAL, follow the pattern of exposing a method to read single export?]] [[hpi]] [[exports]]
- [[Could utilize monzo categories for mine? I guess they could have errors.. Idk]] [[monzo]]
[2021-03-10]
Quickstart — StackAPI 0.1.12 documentation [[exports]][2021-03-07]
Exporting my own comment content from Disqus? · Discuss Disqus · Disqus [[disqus]] [[exports]][2021-03-23]
Your eBay data [[ebay]]- [[possible to have exactly same events with different API ids???]] [[github]]
[2021-04-18]
exobrain/data/exportsgdpr at master · seanbreckenridge/exobrain [[exports]] [[gdpr]]- [[highlights are in UTC]] [[remarkable]] [[koreader]]
I need data exports to build tools around my personal data, and the actual process of exporting it from a silo is the first step.
After I export it I use it to build a 'data mirror'.
Here I mostly keep the notes about the data I haven't finished exporting.
The ones I have already/mostly finished are mentioned here:
related
. [[silo]]
. [[infra]]
. [[backup]]
. [[dataliberation]]
* motivation
Similar to backups.
[2020-03-17]
"GitHub blocked me and all my libraries" https://news.ycombinator.com/item?id=22593595
motivation for using gdpr/takeouts: convenient when you're migrating off the service? don't have to worry about regular exports
[2020-05-31]
Own your content on Social Media using the IndieWeb - YouTube [[dataliberation]]
* implementation: goals/tips/practices
[2019-10-03]
another big goal is having little operational overhead. I'd rather set up a (potentially elaborate) system once and tthen never have to update it and think how it works [[exports]] [[infra]]
[2019-10-03]
that involves automatic ci [[ci]]
[2019-10-03]
continuous cloud sync [[cloud]]
[2019-10-03]
automation/cron jobs for orger [[dron]]
[2021-03-04]
Importance of agnostic exports: ofter you start backing up before you process the data
recommend checking the database to make sure it's got specific things you need [[backupchecker]]
synthetic style exports allow for defensive error handling – you can at least get data from the last state [[errors]]
eh. maybe get rid of colored logs for export process? presumably no one would look at them often
use submodule for common files, but release as a standalone package? I guess it's the best of both
[2020-01-01]
ChromeDevTools/devtools-protocol: Chrome DevTools Protocol [[exports]] [[scrape]]
https://github.com/ChromeDevTools/devtools-protocol
[2020-04-19]
open files using utf-8 encoding (fixes #5) by miguelrochefort · Pull Request #6 · karlicoss/rexport
apply this to export helper…
backup-wrapper is a more generic tool… basically running arb command and saving output with pattern
thinking about data providers [[dataliberation]]
Easiest option is just to have separate scripts to run regularly?
most users won't care about keeping historic data? Or maybe not keeping data at all? jsut provide lambda?
so the backup script could provide TODO
most users won't have cron set up?
so need a way to trigger backup from promnesia indexer itself? Fairly easy to achieve as it's all just python code?
to backup, use some python patternt library?
example how it could work:
in promnesia
def index_reddit():
from exporters.reddit import export
# TODO?
return
dashboard for tokens + expose json or something so any language can have bindings [[infra]] [[exports]]
might be annoying to implement token retrieval on JS only?
[2020-04-12]
add this to myinfra repository??
[2020-05-27]
dunno, I'm a bit tired and not as motivated to build it… but could post so someone else picks up [[toblog]]
ok, exposing a stream is sort of good? and then filtering? makes it easier to use synthetic exports [[hpi]] [[exports]] [[dal]]
* twitter [[twitter]]
Twitter is a big pain in the ass, they've become very hostile towards API access.
Even the archives are somewhat incomplete (e.g. favorites lack some metadata).
E.g. from Apply for API — Twitter Developers
Be thorough
We need to completely understand your use case before we can approve it. So, please include as much detail as possible in your application.
hmm. links that you get through search or API are shortened? [[linkrot]] [[twitter]] [[twint]]
[2020-04-28]
shit.. also RTs are shortened?? so I need to get retweets properly?
[2021-01-19]
bisguzar/twitter-scraper: Scrape the Twitter Frontend API without authentication. [[twitter]] [[exports]]
twint itself should work as incremental export… and then DAL should combine [[twint]]
Even though Twint uses db, they seem to treat is as a temporary storage, so the schema might change.
I'm also not super convinced by how reliable the code is (from quick glance), so would worry about data loss.
[2019-07-28]
jonbakerfish/TweetScraper: TweetScraper is a simple crawler/spider for Twitter Search without using API [[twitter]]
[2021-02-09]
doesn't work, this error :( https://github.com/bisguzar/twitter-scraper/issues/168
talon databases (lots of them!) [[hpi]] [[android]]
[2019-07-29]
taspinar/twitterscraper: Scrape Twitter for Tweets
One of the bigger disadvantages of the Search API is that you can only access Tweets written in the past 7 days. This is a major bottleneck for anyone looking for older past data to make a model from. With TwitterScraper there is no such limitation.
[2021-02-09]
https://github.com/taspinar/twitterscraper/issues/344 broken as well
err, new twitter exports are half gig each?
twint: possibly missing reply things (with 'at') [[twint]] [[hpi]]
compare tw-before.org (twint) and tw-after.org (twidump) in views
retweets in twint are def missing
* hackernews [[hackernews]]
[2020-04-05]
Our plan is for the next version of HN's API to simply serve a JSON version of e… | Hacker News [[hackernews]]
https://news.ycombinator.com/item?id=22788526
Our plan is for the next version of HN's API to simply serve a JSON version of every page. I'm hoping to get to that this year.
[2020-04-07]
Profile: karlicoss | Hacker News
https://news.ycombinator.com/user?id=karlicoss
user: karlicoss
created: August 25, 2016
karma: 757
capture HN karma? maybe on all comments
[2020-04-29]
need to mirror HN… [[hackernews]] [[exports]]
materialistic – 'read' table [[promnesia]]
could also have 'exact' time notion and 'approximate' time – when it's guessed from the file timestamp etc
[2021-03-05]
it's impresive that pretty much every tool for exporting has some flaws [[hackernews]]
don't have ci
-
https://github.com/davenicoll/hackernews
- doesn't even have main??
-
https://github.com/romaintailhurat/hns
- uses pickle??
-
https://github.com/amjd/HN-Saved-Links-Export
- too defensive
- writes to stdout
- can't be used as API
HN data provider [[hpi]] [[orger]] [[promnesia]]
https://github.com/HackerNews/API
https://hacker-news.firebaseio.com/v0/user/karlicoss.json?print=pretty – get user data
extract 'submitted'
https://hacker-news.firebaseio.com/v0/item/25971799.json?print=pretty – comment
https://hacker-news.firebaseio.com/v0/item/25971380.json?print=pretty – type: "story"
dunno if useful to keep scores over time?
not sure if should dump everything in a single json? or split by files?
can change later I guess
* google takeout/other google data [[takeout]]
Google Takeout doesn't have a proper API, and periodic expots are kind of annoying… would be good to automate them.
Another difficulty is that the data seems to have a certain retention,
so you can't just take the latest takeout, for some data you need to merge all of them.
wonder if it's possible to get watch position? [[takeout]] [[youtube]]
automating login & downloading
[2019-09-28]
life-vault/seleniumtakeout.py at master · ThorbenJensen/life-vault https://github.com/ThorbenJensen/life-vault/blob/master/src/takeout/selenium_takeout.py
automating google drive [[takeout]] [[backup]] [[exports]]
- ocamlfuse + script to move to desired location
- basically that only requires you to request new archive occasionally
automate google takeouts?
maybe release my module for 2FA separately?
https://github.com/ThorbenJensen/life-vault/blob/master/src/takeout/selenium_takeout.py
[2021-01-10]
Hypothesis [[takeout]]
Seriously, check out ratarmount if you haven't. Since the Google Takeout spans multiple 50GB tgz files (I'm at ~14, not including Google Drive in the takeout), ratarmount is brilliant. It merges all of the tgz contents into a single folder structure so /path/a/1.jpg and /path/a/1.json might be in different tgz folders but are mounted in to the same folder.
could sync mini-takeouts? with only necessary stuff picked from them [[takeout]]
youtube watch history – should be accumulated from multiple takeouts [[youtube]]
[2019-06-11]
eh, recompressing to .tar.xz only saved 100 mb [[takeout]]
ugh, also when it's too large, they split archive in two [[takeout]]
also disappearing Disover/Myacvitiy?? [[takeout]]
20180807 My Activity/Discover/MyActivity.html 20190523 20181015 My Activity/Discover/MyActivity.html 20190522 20181213 My Activity/Discover/MyActivity.html 20200122
[2020-04-23]
I've found Google Takeouts to silently remove old data | beepb00p
huh, so with my script to search takeout duplicates, I've figured out that from 2015 at least Search/MyActivity.html hasn't been erased? interesting
but looks like Chrome/MyActivity.html still being removed
[2020-04-24]
Takeout/My Activity/Search data is limited to last 10 years. Please remove limit - Google Search Community
Takeout/My Activity/Search data is limited to last 10 years. Please remove limit
[2020-04-29]
> I’ve already pulled down my 2-300GB Google Photos archive How? I've tried sev… | Hacker News
cuu508 1 hour ago [-]
Takeout doesn't work in practice for bigger collections (archive creation routinely fails, timeouts while downloading, 50GB max size results in many splits)
I've used this 3rd party tool and it worked OK: https://github.com/gilesknap/gphotos-sync/
geekgonecrazy 1 hour ago [-]
I forgot to mention this. But yes the export failed several dozen times. I believe I ended up doing in chunks. It was hard to get them off
[2020-05-04]
I replied to a similar point about hashing here - https://news.ycombinator.com/i… | Hacker News
You're correct that the methods I described are a far cry from actually guaranteeing that the backup has no errors. In the same way that a unit test doesn't prove code is error-free, but _can_ justify increased confidence in the code, I'm interested in techniques that can justify increased confidence in my backups. Particularly in cases where I don't have direct access to the original data, and where exhaustively checking the data manually is too time-consuming to be worth it.
yes!
[2020-01-01]
perkeep/gphotos-cdp: This program uses the Chrome DevTools Protocol to drive a Chrome session that downloads your photos stored in Google Photos. [[scrape]]
https://github.com/perkeep/gphotos-cdp
In our original Perkeep issue, @bradfitz said that we might have to give up on APIs and resort to scraping, noting that the Chrome DevTools Protocol makes this pretty easy.
[2019-06-28]
After hoarding over 50k YouTube videos, here is the youtube-dl command I settled on. : DataHoarder
https://www.reddit.com/r/DataHoarder/comments/c6fh4x/after_hoarding_over_50k_youtube_videos_here_is/
After hoarding over 50k YouTube videos, here is the youtube-dl command I settled on.
[2020-01-01]
perkeep/gphotos-cdp: This program uses the Chrome DevTools Protocol to drive a Chrome session that downloads your photos stored in Google Photos. https://github.com/perkeep/gphotos-cdp
we'd like our photos mirrored in seconds or minutes, not weeks.
* emfit sleep tracker [[emfit]]
Emfit QS is my sleep tracker.
[2018-08-18]
Emfit has local API; would be nice to use it… [[emfit]]
https://gist.github.com/harperreed/9d063322eb84e88bc2d0580885011bdd
https://gist.github.com/karlicoss/3361f6a239048a451daa2a02982ee180
[2020-09-11]
sanielfishawy/emfitdatagetter: Gets heart rate and respiration rate from an Emfit QS device on the same local network. [[emfit]]
[2019-12-17]
downloadEmfitAPI.py https://gist.github.com/vanne02135/6901cc2b92315881080d0ce0f07c1a17
ugh. maybe autorefresh the token? Fuckig hell. [[emfit]]
[2021-02-06]
I think I ended up just using login + password. meh
hmm, with emfit can code some sort of feedback tool which signals me to move when emfit loses signal [[emfit]]
[2020-05-29]
emfit API didn't work for about three days straight… [[emfit]] [[backup]]
[2019-12-21]
samuelmr/emfit-qs: Unofficial Node client for Emfit QS
https://github.com/samuelmr/emfit-qs
Exchange username and password to a token (expires in 7 days). You can also log in to qs.emfit.com and check the ´remember_token´ parameter passed to API calls (e.g. with developer tools of your browser).
* bluemaestro temperature sensor [[bluemaestro]]
figure out bluemaestro, make sure all merged [[bluemaestro]]
- State "STRT" from "TODO"
[2019-03-12]
actually wonder if I can connect it to computer? [[bluemaestro]]
merge bluemaestros, plot separate environmental dashboard? [[dashboard]]
automate, about how I back up bluemaestro data [[toblog]]
[2019-09-29]
yeah, could elaborate on backing up android data, could be quite generic? [[android]]
* reddit [[reddit]]
I think cool fact should just be converted into org mode from backups (merged!) but generally there is no point capturing them? [[reddit]]
[2019-09-10]
er, I guess for orger need to extract a simple reddit provider that just merges various timestamped backups?
I guess just rely on bleanser instead after all? Just make it less spammy [[bleanser]] [[reddit]]
Check for deleted favorites [[reddit]]
- State "STRT" from "TODO"
[2019-03-23]
[2019-08-25]
yep, it def happens; promnesia triggers it
shit. need to bleanse reddit properly, otherwise looks like it's too much data… [[reddit]]
basically, just go through stuff that doesn't exist anymore but was in favorites ever (and suppress errors for some of them) [[reddit]]
[2020-01-11]
Getting Started — PRAW 3.6.0 documentation [[reddit]]
https://praw.readthedocs.io/en/v3.6.0/pages/getting_started.html#connecting-to-reddit
You may also have realized that the karma values change from run to run. This inconsistency is due to reddit’s obfuscation of the upvotes and downvotes. The obfuscation is done to everything and everybody to thwart potential cheaters. There’s nothing we can do to prevent this.
* browser history
compress databases as xz? would same about half of space at least, even more on firefox databases [[promnesia]]
[2020-09-05]
probably not necessary with pruning
cleanup firefox phone exports…
firefox history – db format has changed?? [[hpi]] [[infra]]
firefox history – could compress with zstd? seems like 30x compression [[promnesia]]
[2020-06-10]
to start with – simply compress locally once the db is synced, will think about doing something smarter later
firefox dev history [[phone]]
[2020-08-29]
seanbreckenridge/ffexport: export and interface with firefox history/visits and site metadata
* hypothesis [[hypothesis]]
hmm, 9000 limit? might be necessary to do synthetic export instead… [[hypothesis]]
Hypothesis API are cloned as well. [[hypothesis]]
[2020-01-21]
fix in hypexport?
* github [[github]]
[2020-02-01]
motivation for github backups [[exports]]
> if the official repo is taken down, your forks will disappear unless you have a copy.
https://help.github.com/en/github/collaborating-with-issues-...
I don't think that's true, I've personally recovered deleted repositories by finding its forks.
edit: Ah never mind it seems things work differently in the case of DMCA takedowns
warn about large repos? [[github]]
ghexport – read times out [[ghexport]]
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.github.com', port=443): Read timed out. (read timeout=15)
500 error [[ghexport]]
File "/home/karlicos/.local/lib/python3.7/site-packages/github/Requester.py",
line 276, in requestJsonAndCheck
return self.__check(*self.requestJson(verb, url, parameters, headers,
input, self.__customConnection(url)))
File "/home/karlicos/.local/lib/python3.7/site-packages/github/Requester.py",
line 287, in __check
raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 500 None
backport old github backups to new format? should be enough to just wrap in 'events' [[backup]] [[timeline]] [[promnesia]]
github – starred repos aren't updated??
* whatsapp [[whatsapp]]
I don't really use it and it's pretty hostile so unlikely I'll bother.
/data/data/com.whatsapp/databases/msgstore.db [[whatsapp]]
actually has messages!
[2020-01-17]
MasterScrat/Chatistics: 💬 Python scripts to parse your Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames. [[whatsapp]]
https://github.com/MasterScrat/Chatistics
Unfortunately, WhatsApp only lets you export your conversations from your phone and one by one.
On your phone, open the chat conversation you want to export
On Android, tap on ⋮ > More > Export chat. On iOS, tap on the interlocutor's name > Export chat
Choose "Without Media"
Send chat to yourself eg via Email
Unpack the archive and add the individual .txt files to the folder ./raw_data/whatsapp/
[2019-07-13]
tgalal/yowsup: The WhatsApp lib https://github.com/tgalal/yowsup
It seems that recently yowsup gets detected during registration resulting in an instant ban for your number right after registering with the code you receive by sms/voice. I'd strongly recommend to not attempt registration through yowsup until I look further into this. Follow the status of this here.
* stackexchange [[stackexchange]]
[2019-09-01]
Usage of /users/{ids}/favorites GET - Stack Exchange API [[promnesia]]
https://api.stackexchange.com/docs/favorites-on-users
Usage of /users/{ids}/favorites GET
Discussion
Get the questions that users in {ids} have favorited.
This method is effectively a view onto a user's favorites tab.
{ids} can contain up to 100 semicolon delimited ids. To find ids programmatically look for user_id on user or shallow_user objects.
The sorts accepted by this method operate on the following fields of the question object:
activity – last_activity_date
creation – creation_date
votes – score
added – when the user favorited the question
activity is the default sort.
It is possible to create moderately complex queries using sort, min, max, fromdate, and todate.
This method returns a list of questions.
[2019-09-16]
shit. seems that no way to get upvoted posts… https://meta.stackexchange.com/questions/299264/how-to-get-the-list-of-all-posts-ive-upvoted-via-the-api
[2019-09-16]
https://meta.stackexchange.com/questions/148008/how-can-i-see-comments-that-ive-upvoted
[2019-09-16]
fuck. I guess I'm gonna have to scrape votes… https://stackoverflow.com/users/706389/karlicoss?tab=votes
stackexchange – there are comments in GDPR requested data [[stackexchange]]
stackexchange – shit
ERROR:stexport:Giving up fetch_backoff(...) after 1 tries (stackapi.stackapi.StackAPIError: ('https://api.stackexchange.com/2.2/users/706389/privileges/?pagesize=100&page=1&filter=%21LVBj2%28M0Wr1s_VedzkH%28VG&site=alcohol.meta', 502, 'throttle_violation', 'too many requests from this IP, more requests available in 50511 seconds')
stackexchange – need to figure out how to import remaining data…
Today I would probably have tried parsing the Stack Exchange Data Dump instead.
Todo promnesia
from ip Lessons learned from writing ShellCheck, GitHub’s now most starred Haskell project – Vidar's Blog
[2021-02-06]
hmm, it's actual dump of all comments… bit too much I guess
hmm crashed on json decoding? [[stexport]]
[INFO stexport 2021-03-10 08:33:48,004 export.py:161] exporting dsp: users/{ids}/favorites
[INFO stexport 2021-03-10 08:33:48,302 _common.py:86] Backing off fetch_backoff(...) for 0.5s (stackapi.stackapi.StackAPIError: ('https://api.stackexchange.com/2.2/users/706389/comments/?pagesize=100&page=1&filter=%21LVBj2%28M0Wr1s_VedzkH%28VG&&site=dsp', "('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')", "('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')", "('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')"))
[ERROR stexport 2021-03-10 08:33:49,124 _common.py:101] Giving up fetch_backoff(...) after 2 tries (stackapi.stackapi.StackAPIError: ('https://api.stackexchange.com/2.2/users/706389/favorites/?pagesize=100&page=1&filter=%21LVBj2%28M0Wr1s_VedzkH%28VG&site=dsp', 'Expecting value: line 1 column 1 (char 0)', 'Expecting value: line 1 column 1 (char 0)', 'Expecting value: line 1 column 1 (char 0)'))
Traceback (most recent call last):
File "/home/adhoc/.local/lib/python3.8/site-packages/stackapi/stackapi.py", line 204, in fetch
response = response.json()
File "/usr/lib/python3/dist-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
* mastodon [[mastodon]]
[2020-01-11]
kensanata/mastodon-backup: Archive your statuses, favorites and media using the Mastodon API (i.e. login required)
https://github.com/kensanata/mastodon-backup
Thus, if every request gets 20 toots, then we can get at most 6000 toots per five minutes.
zigg/grabby: tools for scraping your Mastodon account data [[mastodon]]
https://github.com/zigg/grabby
[2019-12-29]
halcy/Mastodon.py: Python wrapper for the Mastodon ( https://github.com/tootsuite/mastodon/ ) API. [[mastodon]]
https://github.com/halcy/Mastodon.py
tusky android app keeps some history in tuskyDb [[hpi]] [[mastodon]]
* pinboard [[pinboard]]
huh looks like pinboard is quite unstable with regards to backup… unless the backup script is wrong or something? [[bleanser]]
[2019-04-19]
Pinboard on Twitter: "Next question is, does a raw API call give the same results as the website? The API and website search engine run off of different indexes.… https://t.co/CZrLE7YNWo" [[pinboard]]
<https://twitter.com/Pinboard/status/1113807174717792256 >
Next question is, does a raw API call give the same results as the website? The API and website search engine run off of different indexes.
-–— other data sources
[2020-04-13]
twintproject/twint: An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
[2020-04-23]
MatthieuBizien/roam-to-git: Automatic RoamResearch backup to Git
Format [[links]]
Format #links
Format attribute::
Format [[ [[link 1]] [[link 2]] ]]
Format ((link))
[2020-04-28]
timgrossmann/InstaPy: 📷 Instagram Bot - Tool for automated Instagram interactions
right, so if you enable sync it seems to suck in history on the phone database? eh. messy [[firefox]] [[exports]] [[promnesia]]
crap… android database has really high granulatity of events?? [[rescuetime]]
[2021-02-05]
Chiaki/VKBK: Инструмент для создания и синхронизации локального бэкапа вашего профиля ВКонтакте (Profile backup & synchronization tool for Vk.com) [[vk]] [[exports]]
ugh fuck.. apache & mysql? a bit much for me :(
[2021-02-07]
Against developer terms of service? · Issue #171 · Tyrrrz/DiscordChatExporter
make it kinda smarter? [[backupchecker]]
if it's a single file, don't do anything just yet?
or treat it as 'simple' with month duration or something
just do it doesn't warn immediately. could be a takeout archive or something
hmm, takeout has all tcx files? [[endomondo]]
hide praw logs unless interactive? too spammy in syslog [[infra]]
hm nice podcast addict simply backs up its database [[exports]]
(although it only maintains two?)
Hmm maybe need to check for similar dst problems… Basically mismatch between hr and sleep start/end? [[emfit]]
[2021-02-25]
ryanmcgrath/twython: Actively maintained, pure Python wrapper for the Twitter API. Supports both normal and streaming Twitter APIs. [[python]] [[twitter]]
hmm still working? nice…
[2021-02-04]
Privacy Policy - October 15, 2020 - Reddit [[reddit]] [[exports]]
ugh. gdpr takeout has to be emailed?
[2021-02-05]
Rapptz/discord.py: An API wrapper for Discord written in Python. [[discord]] [[exports]]
[2021-02-08]
Oura ring vs. Emfit QS (My detailed comparison) - What do you think? - Quantified Self / Sports, Physical Activity, and Fitness - Quantified Self Forum [[emfit]] [[exports]]
Can only store 10 hours of data on the device & 360 days in the cloud
huh? motivation for exports I guess
list the takeouts that are reduntant [[takeout]] [[promnesia]]
runnerup database file? could use existing computations perhaps?
maybe for DAL, follow the pattern of exposing a method to read single export? [[hpi]] [[exports]]
so it could cooperate with HPI… egh not sure
Could utilize monzo categories for mine? I guess they could have errors.. Idk [[monzo]]
[2021-03-10]
Quickstart — StackAPI 0.1.12 documentation [[exports]]
By default, StackAPI will return up to 500 items in a single call. It may be less than this, if there are less than 500 items to return. This is common on new or low traffic sites.
The number of results can be modified by changing the page_size and max_pages values. These are multiplied together to get the maximum total number of results. The API paginates the results and StackAPI recombines those pages into a single result.
[2021-03-07]
Exporting my own comment content from Disqus? · Discuss Disqus · Disqus [[disqus]] [[exports]]
seems hostile against exporting your own data
[2021-03-23]
Your eBay data [[ebay]]
can request data takeout here… takes ages to complete though, like a week
possible to have exactly same events with different API ids??? [[github]]
vimdiff <(rg -A 363 -B 1 15538293160 events_20210317T120954Z.json) <(rg -A 363 -B 1 15538293166 events_20210317T120954Z.json)
[2021-04-18]
exobrain/data/exportsgdpr at master · seanbreckenridge/exobrain [[exports]] [[gdpr]]
Some thoughts on how easy to parse/use GDPR/get data exports from different services. A lot of these I did just because I was curious what information/context I could glean into the past about
highlights are in UTC [[remarkable]] [[koreader]]
- public document at doc.anagora.org/exports
- video call at meet.jit.si/exports
(none)
(none)
[[link 1
android
arbtt
backup
backupchecker
bleanser
bluemaestro
ci
cloud
dal
dashboard
dataliberation
datetime
discord
disqus
dron
ebay
emfit
endomondo
errors
exports
feedbin
firefox
garmin
gdpr
ghexport
github
hackernews
hpi
hypothesis
infra
instapaper
koreader
link 2
linkrot
links
location
mastodon
monzo
orger
phone
pinboard
promnesia
publish
python
qs
remarkable
rescuetime
sadinfra
scrape
silo
stackexchange
stexport
takeout
telegram
timeline
toblog
twint
vk
wahoo
youtube