EN | NL | FR
Current Wings Quest 129
Pause Time

long time dreamers - archiving your dream journals?

Post new topic Reply to topic

Author  Message 
obfusc8
obfuscate! :P
cookie lover
obfusc8 has successfully completed an LD4all Quest!
Posts: 1465
Joined: 27 Mar 2014
Last Visit: 19 Aug 2019
 
long time dreamers - archiving your dream journals?
PostPosted: Wed 17 Oct, 2018  Reply with quote

Those of you who have been keeping dream journals for many years, how do you archive/tag/index your journals?

I've been keeping a dream journal for about 5 years now, and it's just a mess. Mine are stored electronically, but if you have a strategy for paper/physical journals too that would be of interest. I started with one 'note' for a month of dreams, because they were not very detailed at first. This progressed to a note/entry per night. Then I swapped to another online note-keeping bit of software, so half are in one, half in another. It's a gigantic mess that's growing daily, and I have no ability to go back and locate particular dreams, lucids, or dream themes/characters.

Anyone got any good strategies?

Some kind of tagging system maybe? (Not very good at this organisation thing.) hmm


back to top
Siiw
Infinite Impatience
Globahead
Siiw has successfully completed an LD4all Quest!
42
Chat Mods
Scribes
Posts: 8657
Joined: 20 Mar 2004
Last Visit: 21 Aug 2019
LD count: last 19.augus
Location: Norway, Ceeia or #ld4all
 
PostPosted: Wed 17 Oct, 2018  Reply with quote

I recently did a full backup of my dream journals here on LD4all.

I have 23 topics with 10 pages each. Fortunately, the moderator's tools gives me a way to see all 10 pages on one screen. From there, it is easy to save the entire topic as a html file. My partner made a script to extract the actual text of the posts from the local copies. It works well, it even got the encoding bug fixed in the output.

This was so I could easily search through the text for how many times certain words were used. I can also easily search for a date, which can be hard to do with the current search tool here.

My old physical DJs are manually tagged with major dream signs. I have been thinking of transferring them into a digital version. Maybe that script will finally make a format that can fuse the pre-LD4all DJs with the ones from here.



Current LD goal(s): To bring back art or a song

Link to My DJ: ld4all.com
back to top
Susan_Y
cookie lover
cookie lover
Susan_Y has successfully completed an LD4all Quest!
Posts: 1335
Joined: 12 Aug 2014
Last Visit: 18 Aug 2019
 
PostPosted: Wed 17 Oct, 2018  Reply with quote

What I do at the moment is have a file with a section for each dream (html markup), and each dream has a date. Searching for particular words can be done with the usual text search function. I also have indexes into it for where interesting features occur.

I think I would like to do more systematic tagging of entries, so that I could do search or statistical analysis by tag.

I have on my "to do" list to use a print on demand service to print a paperback book version of my dream journal. In principle, this just needs setting up some style templates for how I want it printed, and then sending off a PDF to be printed.

The dream art could do with being linked to the dream descriptions, too...


back to top
FiXato
(mobile) IRC-Addict
Astral Explorer
35
Posts: 479
Joined: 07 Oct 2004
Last Visit: 31 Oct 2018
LD count: 6
Location: Ceeia
 
FiXato's journal extraction tool
PostPosted: Wed 17 Oct, 2018  Reply with quote

Siiw wrote:
I recently did a full backup of my dream journals here on LD4all.

I have 23 topics with 10 pages each. Fortunately, the moderator's tools gives me a way to see all 10 pages on one screen. From there, it is easy to save the entire topic as a html file. My partner made a script to extract the actual text of the posts from the local copies. It works well, it even got the encoding bug fixed in the output.

Actually, my work-in-progress tool also created the local copies. It doesn't rely on the mod's splitter tool. Theoretically using a dump of that could reduce the amount of page calls, but it would require moderation interaction, and would lack things like avatars.

Basically, what it does is:

  1. Take a list of URLs
  2. Send a HTTP POST request to log in, if the page requires authentication (using the Ruby gem (library) Faraday)
  3. Send a HTTP GET request for the document
  4. Store the result in a locally cached (HTML) file
  5. Parse the result with Nokogiri (a Ruby gem for parsing HTML/XML documents)
  6. Extract individual posts from that document using CSS selectors, as well as some details such as avatars (which it can also locally cache), post date, username, user profile link, original post link, etc, and store them in memory in a structured Hash variable.
  7. Find the 'Next' link for the next page in the journal, and repeat the previous couple of steps till there is no more Next link.
  8. Move onto the next DJ URL in the list
  9. Go through the in-memory data to export the contents as a simple plain-text file, as well as a fairly simple, but well-structured HTML document.


As this tool relies on the HTML version of the viewtopic pages, there are some limitations:

  • The tool currently only supports LD4All (though I'm trying to make it modular enough to fairly easily add extra extraction classes), which isn't the most structured HTML unfortunately, as it predates common usage of classes and ids that can easily be used to select specific elements. For instance, I currently rely on finding the userprofile link on the left side of the post, then finding the first table ancestor, and from there look up its first table data ancestor, and then its next (sibling) element, just to find the post container. I could probably look for the .postdetails instead, but this way allows me to filter out posts just by a specific (set of) user(s) with a CSS selector, rather than still having to traverse them all.
  • Since it relies on plain HTML, it doesn't quite support the more Web 2.0 approach of loading content through javascript. If a page using an AJAX call to add extra content to the page, it currently wouldn't see it. I'd have to implement something like PhantomJS or Watir to add support for that.
  • Also, since it relies on viewtopic, which only displays the processed HTML output of the post, you won't get the source of your post, including the raw BBCodes used. This shouldn't be a problem if you just want a local archive, but it would make it harder to import your posts into a different forum that also supports BBCode, as you'd have to write/use a HTML->BBCode converter.

    Theoretically I could work-around this by extracting the original BBCode by querying each individual post via posting.php in quote mode, but that would be quite a lot of extra page requests, which I don't want to do as I don't want to stress the server.
    I do already have support to reduce server load a bit by waiting a few seconds between each non-cache hit, but I'd still prefer not to stress webservers without their owners' permission. I'd want to ask Qu for permission first, before implementing that.
  • Ideally a forum or site would provide a json or XML feed with just the bare minimum of data (topic id, topic title, topic starter, post subject, post username, post body (both in formatted HTML as well as source BBCode), user avatar, user signature, post date, post id), preferably with a parameter to adjust the amount of posts per page, to reduce the amount of queries needed. This however would require server-side work with access to the database, which usually isn't an option. (And if you do have access to the database, you can just as well make a database dump wink5)


The tagging is currently done in a very naive way, and by post-processing the HTML document the previous script generated. In the final version this tagging can probably be done at the same time as the HTML document is generated, to save some reprocessing resources.
As I said, the tagging is fairly naive atm; just simple string/css selector matches. For instance, if it includes the LD acronym tag, it assumes it's a Lucid Dream, same with keyphrases such as 'become lucid' or 'lose lucidity'. Unfortunately this leads to quite some false positives, but that feature was more of a proof of concept, and a way to quickly filter down posts to try and spot patterns.

I'm currently refactoring and cleaning up the code, but I'll probably post the source code on Github once I'm done.


back to top
obfusc8
obfuscate! :P
cookie lover
obfusc8 has successfully completed an LD4all Quest!
Posts: 1465
Joined: 27 Mar 2014
Last Visit: 19 Aug 2019
 
Re: FiXato's journal extraction tool
PostPosted: Thu 18 Oct, 2018  Reply with quote

FiXato wrote:
I'm currently refactoring and cleaning up the code, but I'll probably post the source code on Github once I'm done.


The online note service(s) I've used have an extract/backup to HTML, so your tool will probably work, with a few tweaks. I did have the habit of tagging lucids in blue text, so instead of searching for [LD] or BBCode, it can probably search for chosen HTML tags instead. I will have a play about with the backup/export functions and see what it dumps out.

Sounds awesome, cool project, FiXato! siiw

@Susan_Y - It sounds like you manually write/update an index. What do you put in the index, if you don't mind me asking? Brief description? Title? Tags?

@Siiw - Admittedly it was your post on colours in dreams that sparked ideas of better archiving and indexing of my DJs. Wow, 23 topics x 10 pages eek2 I need to get something sorted out, pronto! :D


back to top
FiXato
(mobile) IRC-Addict
Astral Explorer
35
Posts: 479
Joined: 07 Oct 2004
Last Visit: 31 Oct 2018
LD count: 6
Location: Ceeia
 
Re: FiXato's journal extraction tool
PostPosted: Thu 18 Oct, 2018  Reply with quote

obfusc8 wrote:

The online note service(s) I've used have an extract/backup to HTML, so your tool will probably work, with a few tweaks. I did have the habit of tagging lucids in blue text, so instead of searching for [LD] or BBCode, it can probably search for chosen HTML tags instead. I will have a play about with the backup/export functions and see what it dumps out.


Yeah, that shouldn't be a problem. Searching for
Code:
.postbody span[style^="color: blue"]
CSS selector would do the trick. Example from searching from Ruby's interactive console using Nokogiri:
Code:

irb(main):005:0> doc.css('.postbody span[style^="color: blue"]').first
=> #<Nokogiri::XML::Element:0x2ab9a45e819c name="span" attributes=[#<Nokogiri::XML::Attr:0x2ab9a45edc78 name="style" value="color: blue">] children=[#<Nokogiri::XML::Text:0x2ab9a45ec8dc "Pink lines show up in the air on the pictures, and I realise that it is a dream. We start walking through the forest, and I sing loudly. It lasts for only a few moments before I wake up.">]>

(Dreamfragment posted with permission)

You just have to be consistent in *only* using it for LD fragments, and not to colour some other part blue as well. smile Unless you can combine it with a keyword that always exists inside the blue text section.


back to top
FiXato
(mobile) IRC-Addict
Astral Explorer
35
Posts: 479
Joined: 07 Oct 2004
Last Visit: 31 Oct 2018
LD count: 6
Location: Ceeia
 
Print-version
PostPosted: Fri 19 Oct, 2018  Reply with quote

Hmm, I just realised the print-version of topics could also be a good way to extract post contents, especially as you can also get the entire topic on a single page that way. The downside though is that posts don't contain profile links, nor avatars, both of which imho are quite handy if you want to include comments from others in your archive.
For a pure DJ-only archive, it would definitely speed up the archive process though, with a minimal amount of server stress.

If profile links were included too, additional resources such as avatars could be uniquely indexed separately afterwards.


back to top
Display posts from previous:
Post new topic Reply to topic

print  

All times are GMT + 2 Hours
Jump to:  


Powered by phpBB
LD4all ~ spreading the art and knowledge of lucid dreaming online since 1996 ~