Steem Pressure #8: Power Up Your dApp. RethinkDB? KISS my ASAP.

While building your Steem blockchain dApp, you will soon realize that it’s not that easy to get the data you are looking for quickly, unless there’s already an API that gives you what you need.

https://www.youtube.com/watch?v=7IkI6yVvzHI

We have several reliable, general purpose, “full” API nodes, provided by Steemit Inc. and community members:

Steem API endpointowner
https://api.steemit.comSteemit Inc.
https://anyx.io@anyx
https://api.steem.house@gtg
https://steemd.minnowsupportproject.org@followbtcnews
https://steemd.privex.io@privex

You can use them for development or even (if you are brave enough) for production. If your dApp is small and simple, it might work. And yes, in many cases it will. But is it optimal? No.

In previous episodes, I described different types of nodes and their names and how this can be misleading when it comes to the so-called “full” nodes.

TL;DR A full node means here something different than in the Bitcoin realm, where Bitcoin full node is pretty much something that a Steem consensus node can do. Here, on Steem, the word “full” doesn’t refer to anything related to the blockchain - it refers to the fully featured set of APIs enabled within steemd.

Do you really need a full set of API calls? In most cases you don’t. To get an idea about the information you can get from common Steem APIs, take a look at the devportal:

https://developers.steem.io/apidefinitions/

By the way, please consider voting for Steem.DAO @inertia’s proposal, who already greatly contributed to the Dev Portal Documentation and is eager to do more.

For example, if you need the information returned by get_market_history, take a look at condenser_api.get_market_history and at its underlying api, in which case you will find:

Also see: market_history_api.get_market_history Which means that the API node you need should be running a market_history plugin.

Another example is a situation in which you want to find who “Resteemed” (a.k.a. “Reblogged”) a given post. Here, you are lucky, because there’s a method for that called get_reblogged_by and provided by the plugin follow.

Maybe you want to get tags used by a specific author? That’s easy too, there’s the get_tags_used_by_author method provided by - yes, you guessed correctly - tags plugin.

There’s a pretty big chance that if your app needs to use get_market_history, it doesn’t need tags or follow plugins, because it’s not blogging oriented, and vice-versa - if you deal with tags and follow, you probably don’t care about the internal market.

Such knowledge lets you optimize the API infrastructure needed for your dApp. For instance, you can run a “fat node” without market_history, account_history(see Steem Pressure #6) and use hivemind to get the tags and follow features.

That’s not the end yet. You get a lot more data than you’ve asked for, but still that may not be what you need.

Do you know why? If not, then please, read the first paragraph again.

How many blocks has the witness gtg missed so far? That’s easy. Run:

curl -s --data '{"jsonrpc":"2.0", "method":"database_api.find_witnesses", "params": {"owners":["gtg"]}, "id":1}' https://api.steem.house

You will instantly get your answer, because the find_witnesses method returns this information… among many pieces of information that you don’t need.

Now, try something more complicated:

Problem to solve

How many blocks did @gtg produce until block 40000000? And how many blocks did @gtg produce in August 2017? And how many blocks did he produce that contained more than 50 transactions? And what percentage of blocks produced in 2018 had more non-virtual operations than transactions? How many blocks had only one transaction and which of them had the highest number of non-virtual operations? What is the id of the transaction that contains them?

You should be able to answer all these questions, all the data are in the blockchain after all. The problem is that steemd has no suitable API to answer these questions quickly. This means that you either have access to a third party service / API that can answer these questions (see @arcange’s SteemSQL “SQL server database with all Steemit blockchain data”, which is available in subscription model), or look into blocks to find your answer.

In the latter case, you will have to inspect all the blocks ever produced using the get_block method. The good news is that every consensus node can run the block_api plugin. The Steem blockchain has recently passed the 40 million blocks mark. This means that in order to answer the first question, you will need to send more than 40 million requests to the API endpoint. Each time this type of question is asked you will need to repeat the procedure, unless you…

Build and run your own custom Steem API

steemit-meme-api.jpg
TL;DR in Memeglish so that project managers can get it.

No. I’m not going to build an API in this post. It is supposed to be your own API, solving your problems, optimized for your use case. Instead, I’m going to play with some blockchain data to give you some tips.

To illustrate this, let’s analyze the problem described above. . Let’s ignore incoming blocks and keep our focus in the range up to 40000000 blocks (catching up with the head block and dealing with micro-forks is a good topic for another episode; in the meantime, you can take a look at how hivemind does this). Let’s assume that you can’t or don’t want to extend steemd with your custom fancy_plugin or a fancy_plugin_api that does the job.

Keep It Simple Stupid

We don’t need information about virtual operations, so instead of using get_ops_in_block, which requires account_history plugin and API, we can use get_block, which saves a lot of resources, if you want to run your own node.

How does a request look?

curl -s --data '{"jsonrpc":"2.0", "method":"block_api.get_block", "params":{"block_num":197696}, "id":1}'

How does a response look?

{
 "jsonrpc": "2.0",
 "result": {
  "block": {
   "previous": "0003043f5746b4d9d06932467ad852ac5d71231c",
   "timestamp": "2016-03-31T13:56:18",
   "witness": "sminer22",
   "transaction_merkle_root": "63321e3f001bed17d301399c5eeaa9b37a0bf74b",
   "extensions": [],
   "witness_signature": "1f64e94d51baef0b84be04644fafc0b05959e80db64de7b099d00499587f23f97216acd3d8df3c3b81f672b24f8cf47167720d205693d3f5713a65bcf004cffef3",
   "transactions": [
    {
     "ref_block_num": 1087,
     "ref_block_prefix": 3652470359,
     "expiration": "2016-03-31T13:56:45",
     "operations": [
      {
       "type": "vote_operation",
       "value": {
        "voter": "proskynneo",
        "author": "proskynneo",
        "permlink": "steemit-firstpost-1",
        "weight": 10000
       }
      }
     ],
     "extensions": [],
     "signatures": [
      "202765ea6b688c2ce968ff2b3f1f6715fddd67374f409a202ae720ca072cb6a919175c27921edfc24c5e3d898a221e88395af31a155b1d60b2bd5475d30a30e226"
     ]
    }
   ],
   "block_id": "00030440d0fbfc8323a1388cd009f2a7f9f43162",
   "signing_key": "STM6tC4qRjUPKmkqkug5DvSgkeND5DHhnfr3XTgpp4b4nejMEwn9k",
   "transaction_ids": [
    "679bcc6414b2f2bd16818a83f020bb9a478a84c4"
   ]
  }
 },
 "id": 1
}

Well, when pretty printed for your convenience, API normally returns a compact version, which is just one line without extra whitespaces.

Caveat

As you can see, the resulting data don’t contain block_number, but you can use the id that equals the block number you are asking for.

We need to get 40M blocks. That’s a lot, but fortunately you need to do this only once. Irreversible blocks are not going to change because… well, I hope you can guess why ;-) Even if you use jussi’s batch requests, throwing that many requests at public API nodes will quickly trigger rate limiting, and even if it doesn’t, it will take a significant amount of time.

Your own consensus node for simple tasks

You can use your own consensus Steem node for that. Local means faster. Think about latency and how any latency (1ms? 2ms?) is no longer negligible if you multiply it by 40000000. Such a node is not resource hungry, and with MIRA you can easily run it on a 8GB RAM machine with 500GB storage. A consensus node currently needs 245GB for block_log and 56GB for the state file. Besides, your dApp will benefit from such a node also for broadcasting purposes. Configuration is as simple as that:

https://cdn.steemitimages.com/DQmfJxAnAk2jvJMEoo3U3ivLDhie3vPvXxmSU8RfFLMrYcQ/config-ini.gif

Use someone else’s hard work

There are already people (like me) who serve the block_log to speed up replay times. If only there were a place where one could bulk download JSON files with Steem blocks... Well, if there’s a need for it, I can make it available with 1M block packages every time we reach 5 or 10 million. It will save a lot of time for all microservices that require block processing (the last mile can be synced using traditional methods)

Bulk

You’ve already seen how a single block looks in JSON format. Obviously, we are using compact output to save space. The whole data set (40M blocks in the uncompressed JSON format) takes 580GB. For our needs, we need only a small subset of such data.

To get what’s needed we can use jq:

jq -c '
{
 id:        .id,
 block_id:  .result.block.block_id,
 timestamp: .result.block.timestamp,
 witness:   .result.block.witness,
 txs:       .result.block.transactions | length,
 ops:      [.result.block.transactions[].operations[]] | length,
 txids:     .result.block.transaction_ids
}'

After processing our blocks we get 46GB of uncompressed json data.

Everything Should Be Made as Simple as Possible, But Not Simpler

A flat file? Are you kidding me? Use your favorite database to store it. (Unless your favorite database is MongoDB, then use something else.)

Or maybe RethinkDB?

Some time ago @someguy123 told me about some fancy database designed to store JSON.

”RethinkDB is the first open-source, scalable JSON database built from the ground up for the realtime web. It inverts the traditional database architecture by exposing an exciting new access model - instead of polling for changes, the developer can tell RethinkDB to continuously push updated query results to applications in realtime. RethinkDB’s realtime push architecture dramatically reduces the time and effort necessary to build scalable realtime apps.”

So this not only means effective and effortless storage of JSON data, but also sounds like a perfect choice for many Steem dApps. But is it? Maybe. For some needs. It’s not perfect for sure, but no solution is. I’m going to give it a try. Do your own research. Maybe it will fit your needs. If not, look for something else.

Let’s see how easy it is to store data we need in RethinkDB

Step 1: Install RethinkDB

Check the RethinkDB webpage for installation instructions suitable for your server. For Debian or Ubuntu, it’s enough to do the usual:

Add RethinkDB repository
echo "deb https://download.rethinkdb.com/apt `lsb_release -cs` main" | sudo tee /etc/apt/sources.list.d/rethinkdb.list
wget -qO- https://download.rethinkdb.com/apt/pubkey.gpg | sudo apt-key add -
sudo apt-get update
Install RethinkDB package
sudo apt-get install rethinkdb

That’s all. No questions asked.

Install the RethinkDB python driver

Because the rethinkdb import subcommand has - surprisingly - an external dependency on the python driver, depending on the way you installed RethinkDB, you might need to get the RethinkDB python driver too. For that, you need to have pip, and, ideally, virtualenv, so if you don’t then:

sudo apt-get install python3-pip python3-venv

and once you have them, install the RethinkDB python driver into your virtual environment:

python3 -m venv ./rethink
source rethink/bin/activate
pip install rethinkdb
Step 2: Run the server

rethinkdb Yes. That’s it for now. You can play with more advanced features later. It enables an administrative console available through HTTP on localhost:8080

Step 3: Load your data.

Throw your JSON data at RethinkDB instance by running: rethinkdb import -f blockstream.json --table steem.blocks Where blockstream.json is your big 46GB file with data from 40M blocks. It goes to a table called blocks within a database called steem.

  [========================================] 100%
  40000000 rows imported to 1 table in 3615.68 secs
  Done (3615 seconds)

Not bad, on my hardware it consumes json stream a little bit faster than what a 100Mbps link could provide.

rethinkdb_data takes 66GB

rethinkdb.png

Step 5: Make use of the data

You can use the Data Explorer feature from the administrative console, or use Python, Javascript or Ruby to get the data you are looking for.

If you want to learn more look at the RethinkDB documentation

Depending on your needs, you can create and use secondary indexes to speed up your queries.

Or...

Use a different set of tools and solutions that meet your needs. A less exotic solution is to use MySQL that supports a native JSON data type, which combined with generated columns might be a good alternative. (I’m going to try that too.)

Previous episodes of Steem Pressure series

https://www.youtube.com/watch?v=YRwHrVVU2NA

Introducing: Steem Pressure #1 Steem Pressure #2 - Toys for Boys and Girls Steem Pressure #3 - Steem Node 101 Steem Pressure: The Movie ;-) Steem Pressure #4 - Need for Speed Steem Pressure #5 - Run, Block, Run! Steem Pressure #6 - MIRA: YMMV, RTFM, TLDR: LGTM Steem Pressure #7 - Go Fork Yourself! Stay tuned for next episodes of Steem Pressure :-)


If you believe I can be of value to Steem, please vote for me (gtg) as a witness. Use Steemit's Witnesses List or SteemConnect Or if you trust my choice you can set gtg as a proxy that will vote for witnesses for you. Your vote does matter! You can contact me directly on steem.chat, as Gandalf

https://steemitimages.com/DQmSheuDfCHizk1xvHPcrFjQNKfBzgun9UXDxdEp6JJCum9/steem_wide.png Steem On


Comments
Steemie currently dosen't allow you to make comments. If you want to write something, write it using another frontend (like Steemit)
Click there to view this post on Steemit.com
  • @witnessnews

    This post has been resteemed by @witnessnews.

    Follow @witnessnews to keep up with active witness updates.

  • @tarazkp

    I am trying it on my phone now, thanks! ;D

  • @katrina-ariel

    I'm definitely not the target audience for this post, but I'm glad you're doing what you do, so I don't have to. ;) Stay awesome!

  • @starworld

    COMPLAINTS FILED AGAINST STEEMIT

    Unfortunately because of a few scumbags like the following members, Steemit's days as a viable forum are quickly coming to an end, because it has essentially turned into nothing more than a cesspool of fraud and corruption, controlled by childish members, porn promoters, and other such low-life human beings.

    @themarkymark (AKA @buildawhale) - 77.6

    @bullionstackers - 71.9

    @buildawhale (AKA @themarkymark) - 74.9

    @quarantine (AKA @the-reef / AKA @block-power) - 61.2

    @bookguy - 61.0

    @mack-bot - 58.8

    @the-reef (AKA @block-power / AKA @quarantine) - 54.9

    @block-power (AKA @the-reef / AKA @quarantine) - 25.0

    @blockcreate - 25.0

    @steemcleaner - 1.07

    and others.

    The Facts: If any of the above members do not like one of more of your posts, comments or replies for any reason, they will not only down-vote that post or comment, but will then go back and down-vote many, if not all of your previous posts and comments (plus any you post in the future), thus reducing your reputation rating to near nothing. A scumbag is a scumbag, but the real problem lies with the idiots who run and control Steemit. Only a bunch of total morons would ever set up a voting system as it now is designed. The main problem is that a higher reputation member can always down-vote a lower-reputation member, which will have a significant impact on the lower-reputation member's rating. The reverse however does not hold. A lower-reputation member can indeed down-vote a higher-reputation member, but unlike the reverse, the lower-reputation member's down-vote has no impact on the reputation of the high-reputation member. It's like it never happened.

    In any event, whoever devised such a system, unfortunately has more than a few marbles missing. But quite frankly, who cares? I am now in the process of powering down my steem (10 weeks to go) so that I can exit this cesspool as so many thousands have done before me. So if the Steemit's management really wanted to create a cesspool of such scumbags, a real haven for porn lovers and other such low-lifes, congratulations! They accomplished that! Steemit will not survive much longer! Of that fact, I am quite certain!

    To all the scumbags here on Steemit: Go ahead, and bring on the down-votes, and perhaps it may help you feel good for a while, but in the long run, it will not fill your otherwise pathetic empty lives.

    To all other Steemit Members: I'm sure you have come to realize some of these things, as well as the fact that steem payouts are nowhere remotely close to what they are claimed to be. Only those at the very top of the pyramid get the bulk of the payouts, while the rest of us (the vast majority of us), get virtually nothing.

    Please also note that you would be wise to withdraw all your steem from Steemit asap. Official complaints have been filed with the following Government Organizations seeking the shutdown of Steemit:

    The Federal Trade Commission (FTC) https://www.ftccomplaintassistant.gov/?utm_source=takeaction#crnt&panel1-1

    The Internet Crime Complaint Center (IC3) https://www.ic3.gov/default.aspx

    The Federal Bureau of Investigation (FBI) https://www.fbi.gov/investigate/cyber

    The Securities and Exchange Commission (SEC) https://www.sec.gov/divisions/enforce/claims.htm

    Meanwhile, the best thing you can do to help the Steemit platform, and give it a chance of survival, and if you have a higher-reputation, is to down-vote the above members so if not to get them off of Steemit, at least take away their abusive down-voting power.

  • @kenny-crane

    I think this post and series will be useful for those needing to efficiently get a lot of data from the STEEM blockchain. For my limited uses, steem.js has been working fine.

    <script src="https://cdn.jsdelivr.net/npm/steem/dist/steem.min.js"></script>
    


    By the way, the link for Steem Pressure #7 seems broken. Here's where that post is:

    https://steemit.com/steem-pressure/@gtg/steem-pressure-7-go-fork-yourself-step-by-step-guide-to-building-and-setting-up-a-mira-powered-hf21-ready-steem-consensus-node

  • @witnessnews

    This post has been resteemed by @witnessnews.

    Follow @witnessnews to keep up with active witness updates.

  • @detlev

    A useful post and sure, you have my vote since a long time.

    And today, I add as well a !BEER for you.

    PS: Is there someone who might help me to get a node up and advise us in monitoring?

  • @beerlover
    https://cdn.steemitimages.com/DQmaHThyECGhEx8tSfHZbiMFRNYjJ35K92cDgiJjkzBUaJo/One%20sip%20of%20BEER%20for%20you.gif

    View or trade BEER.


    Hey @gtg, here is a little bit of BEER from @detlev for you. Enjoy it!

    Learn how to earn FREE BEER each day by staking.

  • @derangedvisions

    8972E1D8-E1FC-40FD-AA9E-71B6E8852F96.jpeg

  • @afrog

    I have no time at the moment, to follow tis posting but I guess, it will be a quite good source for my purposes too. Before I go on, I have to read your manual, dear @gtg. Thank you for this guide.

  • @steemzzang

    모두의 희망이 되는 스팀을 만들어주기를 바랍니다. 당신이 많은 스티미언들부터 존경받는 증인이 되기를 바랍니다.

  • @steem.newz

    💪💪

  • @aamir835

    https://get.cryptobrowser.site/17/4623611

  • @techcoderx

    Interesting. Offering custom APIs may be the way to go for techcoderx.com full node to stand out.

  • @dwiesiekiery

    The Universe answered. Your post is, like, godsend to me. I stuggled with an idea of starting to learn python for some time. With this Steem Pressure series you've conviced me. Especially 7th episode.

  • @paul3080

    Nice write up. But not quite helpful to me though.

  • @bobinson

    PostgreSQL's JSON support seems to be worth exploring too.

  • @cryptonewz

    This will be a good thing for steemit..

  • @kaptainkrayola

    Wow, someone else on the internet that likes RethinkDB and is suggesting using it for projects? That's awesome, I thought I was the only one!

  • @themarkymark

    (Unless your favorite database is MongoDB, then use something else.)

    ded

  • @moebiusdroste

    This is simply amazing, the human ideas organized by crypto and now available for analyze

  • @albertowhite

    this will be very good for steemit

  • @anthonyadavisii

    This is a fairly meaty post. It may require splitting into a few servings to fully digest. Thanks @gtg

    P.S. using this comment for testing. Test !CHOPS 4

  • @helpiecake

    Hello!

    This post has been manually curated, resteemed and gifted with some virtually delicious cake from the @helpiecake curation team!

    Much love to you from all of us at @helpie! Keep up the great work!

    helpiecake

    Manually curated by @solominer.

    @helpie is a Community Witness.

  • @lasseehlers

    @gtg an anon stupid ballhead corrupt bully that probably got his money from corrupt fiat activities like the rest of the STEEM circle jerks!!!

    Those circle jerks of steemit flag all my posts, even I was promoting STEEM IRL for 3 years... been using the blockchain almost 3 years everyday... I sold most of my steem and created LASSECASH as a bully free alternative to STEEM.

    STEEM is the dumb pipeline to what LASSECASH is.... keep flagging me stupid suckers and ruin steem's price future... LASSECASH is a real positive social media on the blockchain where your corrupt money has no power... your flags are worthless as you have no LASSECASH POWER.

    Over and out.

    Lasse Ehlers

  • @chops.token

    https://steemitimages.com/240x0/https://ipfs.busy.org/ipfs/QmSeZ6cDEcAY54q5eVfrfiFminsb3vnBqhX4DpKZ51x1L3


    Hi gtg, this write-up was particularly meaty, fulfilling, wholesome, beefy and deserving of a CHOPS token. Don't Chomp on it all at once!
    To view or Trade CHOPS visit steem-engine.com

    Check out the #chops-token channel on the SFR Discord

  • @vvk

    Nice article. MySQL? Try PostgreSQL, it's json / jsonb handling is really fast.