Databricks
Databricks
  • 2 956
  • 16 105 448
LakeFlow Demo
Databricks LakeFlow is a new solution that contains everything you need to build and operate production data pipelines. It includes new native, highly scalable connectors for databases including MySQL, Postgres, SQL Server and Oracle and enterprise applications like Salesforce, Microsoft Dynamics, NetSuite, Workday, ServiceNow and Google Analytics. Users can transform data in batch and streaming using standard SQL and Python.
Learn about Data Engineering: www.databricks.com/solutions/data-engineering
Переглядів: 1 665

Відео

Say goodbye to messy JSON headaches with VARIANT
Переглядів 2,2 тис.18 годин тому
Try it out today on Databricks: docs.databricks.com/en/semi-structured/variant.html Read more about it on our blog: www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark If you're curious about the implementation check out the talk: ua-cam.com/video/jtjOfggD4YY/v-deo.html Or read about it on GitHub: github.com/apache/spark/blob/master/common/variant/README.md
Data Intelligence Day Seoul 2024
Переглядів 51423 години тому
Data Intelligence Day Seoul, Korea took place on 23 April 2024 and gathered over 1,200 industry leaders and data and AI experts. Watch Data Intelligence Day Seoul On Demand: events.databricks.com/KoreaDIDays2024
An Introduction to DBRX
Переглядів 3,7 тис.День тому
Learn from Naveen Rao, VP of Generative AI at Databricks, as he explains DBRX, a new, open source foundation model that sets the standard for production quality and price/performance. With up to 3x faster inference, DBRX - outperforms all other open models in quality benchmarks - and that allows enterprises to quickly build your own custom LLM efficiently and with full control. Read more about ...
Demo: How Do I Use DBRX?
Переглядів 1,5 тис.День тому
Watch how DBRX uses Databricks to build and customize GenAI applications using your own enterprise data Read more about DBRX here: www.databricks.com/blog/announcing-dbrx-new-standard-efficient-open-source-customizable-llms?
What's Next for Apache Spark™ Including the Upcoming Release of Apache Spark 4.0
Переглядів 6 тис.День тому
Reynold Xin, Co-founder and Chief Architect, Databricks shares the latest innovation coming out of the Apache Spark™ open source project including a preview of the anticipated release of Spark 4.0 Speakers: Reynold Xin, Co-founder and Chief Architect, Databricks Tareef Kawaf, President, Posit Sofware, PBC
The Evolution of Delta Lake from Data + AI Summit 2024
Переглядів 1,8 тис.День тому
Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format. Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - D...
Setting up PAT and Secret Scope
Переглядів 340День тому
Quick video on how to setup a Personal Access Token and Secret Scope and Secret with Azure Key Vault.
Increase your column sizes without rewriting the entire table
Переглядів 721День тому
Docs: docs.databricks.com/en/delta/type-widening.html
Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit
Переглядів 4,4 тис.День тому
Shant Hovsepian, CTO of Data Warehousing at Databricks announced the biggest Delta Lake release to date, Delta 4.0, during the Data AI Summit 2024 in San Francisco. Speaker: Shant Hovsepian, Chief Technology Officer of Data Warehousing, Databricks
Open Sourcing Unity Catalog Live Onstage with Matei Zaharia at Data + AI Summit 2024
Переглядів 887День тому
Speaker: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Matei Zaharia, Original Creator of Apache Spark™ and MLflow and Chief Technologist at Databricks open sourced Unity Catalog live onstage at the Data AI Summit 2024 in San Francisco.
Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering. Presented by Bilal Aslam
Переглядів 7 тис.День тому
Speaker: Bilal Aslam, Sr. Director of Product Management, Databricks Bilal explains that everything starts with good data and outlines the three steps to good data including, ingesting, transforming and orchestrating your data. Then Bilal announces Databricks LakeFlow - a unified solution for data engineering. With LakeFlow you can ingest data from databases, enterprise apps and cloud sources, ...
Recap of Announcements at Data + AI Summit 2024 with Ali Ghodsi, Co-Founder and CEO, Databricks
Переглядів 826День тому
Ali Ghodsi, Co-founder and CEO of Databricks closes the 2024 Data AI Summit with a recap of Databricks and open source innovation announced during the 4-day conference in San Francisco. Speaker: Ali Ghodsi, Co-founder and CEO, Databricks @Databricks
Announcing Databricks Clean Rooms with Live Demo. Presented by Matei Zaharia and Darshana Sivakumar
Переглядів 993День тому
Speakers: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Darshana Sivakumar, Staff Product Manager, Databricks Organizations are looking for ways to securely exchange their data and collaborate with external partners to foster data-driven innovations. In the past, organizations had limited data sharing solutions, relinquishing control over how their ...
Data Sharing and Cross-Organization Collaboration. Presented by Matei Zaharia at Data + AI Summit
Переглядів 327День тому
Speaker: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Summary: Data sharing and collaboration are important aspects of the data space. Matei Zaharia explains the evolution of the Databricks data platform to facilitate data sharing and collaboration for customers and their partners. Delta Sharing allows you to share parts of your table with third pa...
Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data + AI Summit
Переглядів 863День тому
Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data AI Summit
Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data + AI Summit 2024
Переглядів 2,6 тис.День тому
Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data AI Summit 2024
Unity Catalog Demo of New Features with Zeashan Pappa at Data + AI Summit 2024
Переглядів 905День тому
Unity Catalog Demo of New Features with Zeashan Pappa at Data AI Summit 2024
How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data + AI Summit
Переглядів 324День тому
How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data AI Summit
The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data + AI Summit
Переглядів 531День тому
The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data AI Summit
How to Make Small Language Models Work. Yejin Choi Presents at Data + AI Summit 2024
Переглядів 3,3 тис.День тому
How to Make Small Language Models Work. Yejin Choi Presents at Data AI Summit 2024
Data + AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi
Переглядів 238День тому
Data AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi
Building an Enterprise Data & AI Catalog with Databricks Unity Catalog
Переглядів 1,2 тис.День тому
Building an Enterprise Data & AI Catalog with Databricks Unity Catalog
Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems
Переглядів 1,4 тис.День тому
Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems
The Best Data Warehouse is a Lakehouse
Переглядів 4,1 тис.День тому
The Best Data Warehouse is a Lakehouse
Building and Deploying GenAI Apps at Block with Jackie Brosamer, Head of AI, Data & Analytics
Переглядів 616День тому
Building and Deploying GenAI Apps at Block with Jackie Brosamer, Head of AI, Data & Analytics
Fei Fei Li, Professor, Stanford University on the History and Future of AI at Data + AI Summit 2024
Переглядів 16 тис.День тому
Fei Fei Li, Professor, Stanford University on the History and Future of AI at Data AI Summit 2024
Building an Insights Factory at General Motors - Data + AI Summit 2024
Переглядів 434День тому
Building an Insights Factory at General Motors - Data AI Summit 2024
Jensen Huang, Founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks
Переглядів 37 тис.День тому
Jensen Huang, Founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks
Ali Ghodsi, Databricks Co-founder and CEO Closes the Keynote with a Summary of Product Announcements
Переглядів 323День тому
Ali Ghodsi, Databricks Co-founder and CEO Closes the Keynote with a Summary of Product Announcements

КОМЕНТАРІ

  • @yao5261
    @yao5261 14 годин тому

    懂了,赛博号脉!

  • @FullEvent5678
    @FullEvent5678 День тому

    Very inspiring! My mind is going att 1000 miles an hour with ideas for our startup and clients from this!

  • @subedi04
    @subedi04 День тому

    Where can access your code or workbook? Would be nie to run your code.

  • @AadidevSooknananNXS
    @AadidevSooknananNXS День тому

    Holden and team are incredibly engaging and very easy to understand!

  • @ia6906
    @ia6906 День тому

    Great feature, please also include low code features in order to be more beneficial as Data factory also has for ETL

  • @Naraharisettiraviteja
    @Naraharisettiraviteja День тому

    awesome

  • @brento2890
    @brento2890 День тому

    Excellent presentation, beginning 3.5-4.0 Billion years ago and explaining all the way to now (AI, non-physical-spatial). Excellent. Thank you. 👏

  • @TheDataArchitect
    @TheDataArchitect 2 дні тому

    Who's the speaker?

    • @Databricks
      @Databricks День тому

      Holly Smith - FYI it's also me in the comments for my videos so fire away with any technical follow on questions - Holly

    • @TheDataArchitect
      @TheDataArchitect День тому

      @@Databricks Awesome thanks

  • @muhammadibrahimabdullahi3840

    AI can do everything you need to do in times of studying and understanding AI.

  • @benim1917
    @benim1917 2 дні тому

    Awesome 👏🏾

  • @Thegameplay2
    @Thegameplay2 2 дні тому

    🎉

  • @gravenguan
    @gravenguan 2 дні тому

    How did parse_json handle schema evolution and from my kowledge, prod table do not recommend parse schema on the fly, it's more safer to define schema first

    • @Databricks
      @Databricks 2 дні тому

      I agree, but with a lot of JSON data you don't know the schema upfront and so can't define it. It's worth noting this is different from inferring the schema which looks at the first 1000 rows and is brittle to upstream changes - Holly

    • @gravenguan
      @gravenguan 2 дні тому

      @@Databricks We used parse_json for dev and exploration purposes as well, thank for the clarification

    • @Databricks
      @Databricks 2 дні тому

      @@gravenguan No worries! Hope this clarifies for other users too

  • @nagendrasrinivas-cj7sr
    @nagendrasrinivas-cj7sr 2 дні тому

    this is clearly copied from snowflake

    • @Databricks
      @Databricks 2 дні тому

      Variants in their various forms have been around for many decades. We're big fans of open source so anyone can use the implementation in other projects or products.

  • @TheDataArchitect
    @TheDataArchitect 2 дні тому

    That's awesome.

  • @matthiasmueller9340
    @matthiasmueller9340 2 дні тому

    How can I specify the required runtime version when using serverless sql warehouse?

    • @Databricks
      @Databricks 2 дні тому

      Variant types will be coming to serverless early/mid July, no need to select a runtime - Holly

  • @afrikaniz3d
    @afrikaniz3d 2 дні тому

    Only note for these videos, since they're not Shorts, ia that it would be more beneficial to use the full wide (1920 x 1080) format, so it's more readable at all resolutions.

    • @Databricks
      @Databricks День тому

      I completely hear you, trying to figure out the best way to film for multiple platforms at once when some define 'short' as <10 mins and UA-cam graces me with a mere 60 seconds - Holly

  • @EranM
    @EranM 3 дні тому

    Can't you get the score (ranking score | similarity score) while fetching items from the Vector DB? ..

  • @EranM
    @EranM 3 дні тому

    can someone explain to me, how come you calculate USER embedding when training. And when searching for similar embeddings, you actually get ITEMS embeddings???

  • @LQDEN
    @LQDEN 3 дні тому

    Still didn't explain what it is exactly

  • @gybob100
    @gybob100 4 дні тому

    The shovel company telling you how valuable the gold is

  • @user-he1hs5vx3d
    @user-he1hs5vx3d 4 дні тому

    She is creepy because she is not an honest person. She keeps stealing others works and ideas to pretend she is an expert. To make her greater, she belittles others, including her student (5:30).

  • @jianguo8233
    @jianguo8233 5 днів тому

    Is 4.0 a release or preview today?

  • @uchechukwumadu9625
    @uchechukwumadu9625 5 днів тому

    Insightful!

  • @slavenlulic7736
    @slavenlulic7736 5 днів тому

    powerfull

  • @SnatrWhamo
    @SnatrWhamo 6 днів тому

    Great video and very very useful! While implementing, I got stuck uploading the pdf to a Volume in the Unity Catalog. I am the "Owner" of my Databricks Workspace and Azure account although I don't seem to have the option to add a Volume to a Catalog and thus don't have the option to add the pdf to a Volume. This seems to have to do with permissions and possibly setting up a metastore between DataBricks and Azure Blob Storage? Might you have any insights, ideas, solutions or workarounds? Thanks again for a great video and all the resources to implement this super useful technology!

    • @jasondrew2087
      @jasondrew2087 6 днів тому

      Couple of things, you need USE SCHEMA and CREATE VOLUME permissions on the Schema and USE CATALOG on the catalog. Also you need CREATE EXTERNAL VOLUME permissions on the External Location you plan on using for your Volume.

  • @BlizzardzRS
    @BlizzardzRS 6 днів тому

    While I appreciate the contributions Databricks's makes to the open source community, *this video is incredibly misleading*. DBRX is *not* the highest production quality open-source model nor the best in price per performance. The graph you showed is incredibly misleading, not least because you compared your models to LLaMa2-70B. No one in their right mind at the time of this video's recording is using LLaMa2-70B. Everyone has moved on to LLaMa3, with many providers even disabling LLaMa2 on their platforms because it is more expensive and less performant than LLaMa3. A fairer comparison would be between DBRX and LLaMa3-70B and LLaMa3-8B. You didn’t show that because DBRX gets roasted in these comparisons. (Your talked about the cost associated with training your LLMs and how the cost has come down substantially. Really, this is an argument that the $10M Mosaic/Databricks have spent on DBRX is already redundant. You guys are losing credibility by posting stuff like this. Databricks does some great work. Don’t tarnish your reputation with borderline fraudulent content like this.

  • @georges7298
    @georges7298 6 днів тому

    Thanks - for the open sourcing, and for the summit.

  • @BeginnerAlchemist
    @BeginnerAlchemist 6 днів тому

    I have a question: why we try to research Small-LM just to avoid using GPUs? If we want to save the money for training, we can do the research for how to make GPU or model more effectively, not to avoid using higher techs.

    • @DamaruM
      @DamaruM 6 днів тому

      GPU= power consumption

    • @tulikabose5120
      @tulikabose5120 3 дні тому

      It's not just for GPUs...Small-LM has its own market for on-device or on-edge processing, where there are concerns of privacy and customers would not want their data to go to clouds, and secondly in many industrial use-cases where internet and cloud access isn't accessible due to the remote nature of the use-case, and model inference needs to be done on device...The demand for SLMs is increasing in such use cases...Many big tech companies are not just working on LLMs but also on SLMs under the hood as both of them have to co-exist to cater to different user requirements.

    • @BeginnerAlchemist
      @BeginnerAlchemist 2 дні тому

      @@tulikabose5120 Thank you, I see. It is useful for small devices with limited calculation hardware and the privacy. That's true. So many LLM need a huge data to train and it should collect people's private info to become stronger. That's hated by most of people.

  • @mc.pretzel
    @mc.pretzel 7 днів тому

    Boomshakalaka!

  • @plartoo
    @plartoo 7 днів тому

    :D Show us how to do more complex data transformations than just a simple join you demo-ed and what the actual limitations are (because that's where the reality meets the demo). While you are at it, tell us how to automate (schedule) this pipeline and set up notifications and data quality checks. Next, let us know how to QA that dashboard you let GenAI created (to make sure it's not hallucinating and spitting out bullshit while destroying our firm's reputation), and how to surface it to customers via URL in a secure way (without paying you through our noses). Finally, tell us how much it costs to process GBs of data per month. This is the unbearably condescending demo that assumes the attendees are stupid and don't know what entails in serious, real-world data wrangling. And I know a couple of my clients who are leaving Databricks because they are freaking expensive.

    • @ser1ification
      @ser1ification 3 дні тому

      Exactly. I’m tired of these hype machines. Everything is in beta. Customers are the beta testers. Only thing these guys did good is the Unity Catalog. Of course Spark and Delta as well.

  • @gopi4841
    @gopi4841 7 днів тому

    Nice one, Darshana.

  • @xiaoyu2270
    @xiaoyu2270 7 днів тому

    jensen from china wenzhou

    • @chima6291
      @chima6291 2 дні тому

      bullshit. He was born in Taiwan

  • @forrestbajbek3900
    @forrestbajbek3900 8 днів тому

    Wow, this is a huge improvement.

  • @AleksandarKrumov-pm4tk
    @AleksandarKrumov-pm4tk 8 днів тому

    wow

  • @cobrider2
    @cobrider2 8 днів тому

    2 reactions: - by querying the table with duckdb, the authentication and permission is handled only by Unity Catalog, and not by the underlying storage solution (AWS S3, Azure ADLS, ...). right ? - Applying column masks will only work for hosted compute like the databricks clusters, because querying with a local self hosted compute like DuckDB requires to download the parquet files (containing the PII data) locally then only execute the query... meaning you actually have PII data downloaded on your local machine. right ?

  • @cobrider2
    @cobrider2 8 днів тому

    had a laugh, thank you

  • @subhroitmecse
    @subhroitmecse 8 днів тому

    Examples are not clear about Delta lake ACID properties.

  • @Clammer999
    @Clammer999 8 днів тому

    One of my favourite AI legends. Her passion for humanity and how AI can be leveraged to help improve people’s lives is admirable and astounding.

  • @WonkaTruck
    @WonkaTruck 8 днів тому

    I still can't read Iceberg in Databricks, stop hoping for adoption and just fix that...

  • @DCC72
    @DCC72 8 днів тому

    And from nothing, a college professor just evolved from the bacteria. Rubbish.

  • @sunnychabbi3639
    @sunnychabbi3639 8 днів тому

    Pls provide notebook. It is not available in dbdemos

  • @GerardInnes
    @GerardInnes 8 днів тому

    As a new UA-camr you doing very well. He teaching us option trading nicely. Just need to be consistent with this process of trading on binary options...

  • @henryebube3576
    @henryebube3576 9 днів тому

    I followed you tutor.I get stuck at 9.38. I type databricks-bge-large-en as the embedding model but the create button is disable not sure why

    • @jasondrew2087
      @jasondrew2087 8 днів тому

      You shouldn't have to type it in, rather it should be an option in the drop down. If you go to Serving do you see it listed as a Foundational model?