Tabular

Tabular

Tabular

subscribers

Series: Ask the Iceberg Expert
Guest: Ryan Blue, co-creator of Iceberg, and co-founder of Tabular
Subject: Introduction to Iceberg and its origins at Netflix (Iceberg 101)
iceberg.apache.org
www.tabular.io
#iceberg #datalake #tabular #ryanblue

In this video, we demonstrate how to use the PyIceberg CLI. For the demo, we use the docker-spark-iceberg setup that's available here: https://github.com/tabular-io/docker-spark-iceberg
You can also read the companion article at https://tabular.medium.com/reading-apache-iceberg-from-python-with-pyiceberg-8b8cff36f4f0

First, we create a table using Spark through the Jupyter notebook.
Next, we browse the catalog using the `pyiceberg` CLI. We install pyiceberg from pip using `pip install "pyiceberg[pyarrow]"`.

For a complete overview of all the installation options, please refer to the documentation:
https://py.iceberg.apache.org/

Next we demonstrate several commands like list, describe, and files to retrieve information about the iceberg tables. In the end, we show how easy it is to accidentally drop a table using the CLI.

If there are any questions, please reach out using the Iceberg Slack: https://iceberg.apache.org/community/

or open an issue or pull request on Github https://github.com/apache/iceberg

Join Ryan Blue, Daniel Weeks, and Jason Reid for a candid fireside chat. They discuss why they decided to build a data platform, what problems Tabular solves, who they are building it for, and—most importantly—why data engineers will love it.

Video Transcription
Jason Reid: [cold start] …and the nice thing is it doesn’t matter where you’re starting from. It could be you're starting from I have event data I want to stream into something, I don’t know exactly what I want to do with it yet, but I’ve got to collect it somewhere. Or it could be you have data and you want to run simple ad-hoc SQL against it using—you know—Athena or some sort of simple Trino. Or it could be that you’re doing deep ML. No matter where you start Tabular and Iceberg can support that use case and then when you’re ready for the other one’s nothing new has to happen. Right? You just plug in that next engine like we talked about. And so you can just start anywhere and be confident that you can grow in any direction as your needs evolve.

[…]

Ryan, you’re the co-founder and CEO at Tabular, maybe you can start with why did you bring me along on this journey.

Ryan Blue: You know the data engineering side and the technology side and how they should fit together. And I always thought that your perspective at Netflix was invaluable to informing what we needed to build. And that’s exactly what you’re doing now.

Jason: So far so good.

Ryan: Yeah, that worked out amazingly well.

Jason: …just as long as we make data engineering lives easier I’m pretty happy.

Ryan: Exactly! That’s why we exist.

Jason: Dan, Ryan convinced you to come along on this crazy journey with us. What’s the big reason you decided it was time to leave Netflix and do Tabular?

Dan: I went to Tabular to lead engineering and build a platform that people could reuse without out all the complexity and the cost. And get the latest state-of-the-art, building on top of Iceberg, in a world where engines can interconnect and you have incredible capability across lots of different platforms.

Jason: I think that one of the appeals of the modern data stack is that it allowed companies to get up and running—doing relatively sophisticated data things at scale—very easily. Right? And if there's any really big complaint about the modern data stack is it was so easy to spin up to do data things, that it gets out of control quickly, and now your cost vectors are of control but you’re likely locked into the architecture that you have for various reasons. And I think what Tabular provides is another avenue there. Where you still get the simplicity to spin up something quickly. Get started, prove value, all of the things that we love about the modern data stack, but something that is also based on an open format and has the cost and scalability mechanics of cloud.

Dan: …a system like this that everyone always pushes to the end is security.

Jason: [off camera] Always.

Dan: Where you think your use cases are driving the growth

SHOW MORE

Created 1 month ago.

3 videos

Category None

Tabular is a cloud-native warehouse and automation platform. It’s one central store for all your analytic data that can be used anywhere. We are operationalizing Apache Iceberg at an enterprise level. Learn more about Tabular and Iceberg on our channel.