Close Menu
    What's Hot

    Ethereum Enters Strategic Pause: Will Accumulation Below Resistance Spark A Surge?

    Solana indicators point north, bulls test $165 target

    Cardano is at the Nexus of Bitcoin DeFi: Charles Hoskinson

    Facebook X (Twitter) Instagram
    yeek.io
    • Crypto Chart
    • Crypto Price Chart
    X (Twitter) Instagram TikTok
    Trending Topics:
    • Altcoin
    • Bitcoin
    • Blockchain
    • Crypto News
    • DeFi
    • Ethereum
    • Meme Coins
    • NFTs
    • Web 3
    yeek.io
    • Altcoin
    • Bitcoin
    • Blockchain
    • Crypto News
    • DeFi
    • Ethereum
    • Meme Coins
    • NFTs
    • Web 3
    DeFi

    Crypto Data Scale Problems – Kerman Kohli

    Yeek.ioBy Yeek.ioDecember 4, 2024No Comments6 Mins Read
    Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    It’s 2024 and you’d think that getting crypto data is easy because you have Etherscan, Dune and Nansen that let you see data you want all the time. Well, kind of.

    You see, in normal web2 land, when you have a company with 10-employees and 100,000 customers, the amount of data you’re producing is probably no more than 100s of giga bytes (on the upper hand). That scale of data is small enough your iPhone can crunch any questions you have and store everything. However, once you have 1,000 employees and 100,000,000 customers, the amount of data you’re probably dealing with is now in hundreds of terabytes, if not petabytes.

    This is fundamentally an entirely different challenge since the scale you’re dealing with requires a lot more considerations. To process hundreds of terabytes of data, you need a distributed cluster of computers to send the jobs to. When sending these jobs you have to think about:

    • What happens if a worker fails to do their job

    • What happens if one worker takes a lot longer than the others

    • How do you figure which job to give which worker

    • How do you combine all of their results together and ensure the computation was done correctly

    These are all considerations that you need to think about when dealing with big data compute across multiple machines. Scale breeds issues that are invisible to those who don’t work with it. Data is one of those domains where the more you scale up, the more infrastructure you need to manage it correctly. Invisible problems to most people. To handle this scale you also have additional challenges:

    • Extremely specialised talent that knows how to operate machines at this scale

    • The cost to store and compute all the data

    • Forward planning and architecture to ensure your needs can be supported

    It’s funny, in web2 everyone wanted the data to be public. In web3, it finally is but very few know how to do the necessary work to make sense of it. One deceiving fact about this is that with some assistance, you can get your set of data from the global data set somewhat easily which means that “local” data is easy, however “global” data is hard to get (things that pertain to everyone and everything).

    As if things aren’t already challenging with the scale you have to work with. There is a new dimension that makes crypto data challenging and that’s the fact you have continuous fragmentation due to financial incentives of the market. For example:

    • Rise of new blockchains. There are close to 50 L2s lives, 50 known to be upcoming and hundreds more in the pipeline. Each L2 is effectively a new database source that needs to be indexed and configured. Hopefully they’re standardised but you can’t always be sure!

    • Rise of new virtual machines. EVM is just one domain. SVM, Move VM and countless others are coming to market. Each new type of virtual machine means an entirely new data scheme that has to be considered from first principles and deep understanding. How many VMs are there? Well investors will incentivise a new to the tune of billions of dollars!

    • Rise of new account primitives. Smart contract wallets, hosted wallets, account abstraction throw a new complication into the mix of how you actually interpret a data. The from address may not actually be the real user because it was submitted by a relayed and the real user is somewhere in the mix (if you look hard enough).

    Fragmentation can be particularly challenging given you can’t quantify what you don’t know. You will never know all the L2s that exist in the world and the virtual machines that will come out in total. You will be able to keep up once they reach enough scale but that’s a story for another time.

    This last one I think catches a lot of people by surprise and it’s the fact that yes the data is open, but no it is not interoperable easily. You see, all the smart contracts that team pieces together is like a little database inside a larger database. I like to think of them as schemas. All the data is there, but how you piece it together is usually understood by the team that developed the smart contracts. You can spend time to understand it yourself if you’d like but you’ll have to do it hundreds of times for all the potential schemas — and how are you going to even afford to do that without burning through large sums of money without a buyer on the other side of the transaction?

    In case this feels too abstract, let me provide an example. You say “How much does this user utilise bridges?”. Although that presents as one question, it has many nested problems in it. Let’s break it down:

    • You first need to know all the bridges that exist. Also on the chains that you care about it. If it’s all the chains, well we already mentioned above why this is challenging.

    • Then for each bridge you need to understand how their smart contracts work

    • Once you’ve understood all the permutations, you now need to reason through a model that can unify all these individual schemas

    Each of the above challenges are very challenging to figure out and highly resource intensive.

    So what does this all lead to? Well the state of the ecosystem we have today where…

    • Ecosystem where no one actually knows what’s truly happening. There’s just a hand-wavey notion of activity that is hard to properly quantify.

    • Inflated user counts and challenging to detect sybils. Metrics start to become irrelevant and untrustworthy! What’s real or fake doesn’t even matter to market participants because it all looks the same.

    • Main issues with making on-chain identity real. If you want to have a strong sense of identity, accurate data is critical otherwise your identity is being misrepresented!

    I hope this article has helped open your eyes to the realities of the data landscape in crypto. If you are facing any of these issues or want to learn how to overcome them, reach out — my team and I are tackling these.

    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    Previous ArticleBNB reaches new all-time high after PancakeSwap unveils launch platform for BNB meme coins
    Next Article BTC is a Stock Market Amplified Ponzi: ZeMing Gao
    Avatar
    Yeek.io
    • Website

    Yeek.io is your trusted source for the latest cryptocurrency news, market updates, and blockchain insights. Stay informed with real-time updates, expert analysis, and comprehensive guides to navigate the dynamic world of crypto.

    Related Posts

    Cardano is at the Nexus of Bitcoin DeFi: Charles Hoskinson

    June 7, 2025

    Which is the Future of Blockchain Privacy?

    June 7, 2025

    Is the Push to Ban Crypto Mixers an Attack on Financial Privacy?

    June 7, 2025
    Leave A Reply Cancel Reply

    Advertisement
    Demo
    Latest Posts

    Ethereum Enters Strategic Pause: Will Accumulation Below Resistance Spark A Surge?

    Solana indicators point north, bulls test $165 target

    Cardano is at the Nexus of Bitcoin DeFi: Charles Hoskinson

    ChatGPT vs Cursor.ai vs Windsurf

    Popular Posts
    Advertisement
    Demo
    X (Twitter) TikTok Instagram

    Categories

    • Altcoin
    • Bitcoin
    • Blockchain
    • Crypto News

    Categories

    • Defi
    • Ethereum
    • Meme Coins
    • Nfts

    Quick Links

    • Home
    • About
    • Contact
    • Privacy Policy

    Important Links

    • Crypto Chart
    • Crypto Price Chart
    © 2025 Yeek. All Copyright Reserved

    Type above and press Enter to search. Press Esc to cancel.