The Databricks Data and AI Summit 2025 has just wrapped up in San Francisco, drawing 22,000 attendees to the city and a total of 65,000 virtual and in-person attendees across the whole conference, making it one of the largest Data and AI conferences globally.
The pervasive theme this year was Data Intelligence, which can be distilled down to the democratisation of data and AI – a clear strategic move to bridge the gap between business needs and the complex reality of end-to-end development of AI solutions.
Let’s dive into the key product announcements and their implications for the future of data and AI in the Australian market and beyond.
Day one: all eyes on the future
Databricks Free Edition: lowering the barrier to entry
One of the early announcements was the Databricks Free Edition, backed by a USD $100 million investment in open-source training. This move makes Databricks accessible to anyone, effectively open-sourcing a “thin slice” of their platform.
Even though it will be a ‘very small’ instance of the product, this is a smart strategy for Databricks, particularly as they aim to expand their global footprint beyond a single-digit market share in many countries. By offering a free sample, they are fostering widespread adoption and familiarity with their platform.
This initiative mirrors the successful strategy employed by Tableau in the early 2010s, where offering a free trial version significantly boosted market penetration. For Australian businesses and educational institutions, this means unprecedented access to powerful data and AI self-paced learning, helping to develop and democratise data and AI literacy.
Lakebase: challenging traditional transactional databases for the AI era
The transactional database space has remained largely unchanged for decades, often characterised by high costs, vendor lock-in, and on-premise infrastructure that hinders real-time AI use cases. Lakebase emerges as a disruptive force, creating a new category of open-source databases that separate compute and storage, purpose-built for AI. The core problem Lakebase addresses is the prohibitive cost and complexity of rapidly experimenting with vast transaction datasets.
Inspired by software engineering’s code branching concept, Lakebase allows users to “branch off” production data for experimentation, only incurring costs for the changes made to that data. This “pay-as-you-go” model for data experimentation might prove to be a game-changer, especially in the age of AI agents. Imagine each agent having its own experimental data branch, eliminating the potentially huge costs typically associated with agents interacting with traditional transactional databases.
The recent acquisition of Neon, whose CEO revealed that 80% of new databases in Neon are created by agents (four times the human rate), indicates that we’re seeing a revolution in engineering practices. By unifying operational and analytical data, Lakebase aims to empower businesses to make rapid, informed decisions by seamlessly combining multiple data sources (e.g. transactional data with forecast data).
“Now, with Lakebase, we’re creating a new category in the database market: a modern Postgres database, deeply integrated with the lakehouse and today’s development stacks.
“As AI agents reshape how businesses operate, Fortune 500 companies are ready to replace outdated systems. With Lakebase, we’re giving them a database built for the demands of the AI era.”
Ali Ghodsi, Co-founder and CEO, Databricks
Databricks Apps: Bringing AI Applications Closer to the Data
As we see the rise of “vibe coding” creating an appealing UI could be seen as “the easy part” but securely connecting Apps to data and AI for production-grade deployment remains a significant hurdle for many businesses. Databricks Apps, now generally available after a 12-month private preview with 25,000 customers, directly addresses this challenge. The core innovation here is bringing applications closer to the data within the Databricks environment, effectively streamlining the development of production-ready AI applications.
During the keynote a live demonstration showcased the ability to query data in real-time using natural language, with agents inferring visualisations based on user requests. The user was then able to manually manipulate these visualisations – moving widgets, resizing graphs, and even customising colours and fonts. This could fundamentally change how businesses create dashboards and infer insights, reducing reliance on large development teams.
Agent Bricks: Demystifying and Democratising AI Agent Development
The conversation around AI agents at the Summit, including insights from Dario Amodei, CEO of Anthropic, highlighted both their immense potential and the challenges of productionising them. Key concerns include performance monitoring, hallucination detection, understanding the underlying mechanisms, navigating the plethora of techniques and models, and balancing cost versus quality. This sets the stage perfectly for Agent Bricks – a product that allows for the automated development of agentic frameworks. A potentially very exciting innovation…the only challenge… there is not yet a release date for the ANZ market!
Agent Bricks provides pre-built, problem-specific agent components, which solve specific use cases such as information extraction, knowledge assistance, multi-agent supervision, and custom chatbots. This open-source approach simplifies the development of end-to-end, production-ready enterprise AI products, making them accessible even to those without deep coding expertise.
A standout feature is the intelligent selection and optimisation of models. Users can describe a high-level problem, and Agent Bricks will leverage an LLM to score and recommend the best models, providing visualisations of cost versus quality tradeoffs. This feedback loop allows for continuous learning and improvement, truly optimising agent performance. The live demo, showcasing an agent building a new soft drink product plan, was impressive. It demonstrated the ability to train agents on domain-specific jargon, visualise their reasoning process (effectively “showing their work”), and generate comprehensive business plans by orchestrating multiple specialised agents (e.g., R&D and marketing agents).
Day two: emphasising enterprise needs
Unity Catalog: Unifying Data Governance for Trust and Consistency
Day two of the Summit honed in on data engineering, with a strong emphasis on simplifying data ingestion, ensuring data trustworthiness, and leveraging it for downstream value. Unity Catalog, already adopted by 97% of Databricks customers, emerged as a central theme, reinforcing its role as a unified governance system and a rapidly expanding ecosystem. The core takeaway was the enhanced ease of accessing diverse data sources and formats (hello Iceberg!) within Unity Catalog, ensuring trust and consistency across all systems through features like consistent read and write rules. The open-source nature of Unity Catalog is a shrewd move, further solidifying Databricks’ market traction in providing a single location for data storage and governance.
A particularly impactful announcement was the expansion of Unity Catalog for business users, aligning with the broader theme of data and AI democratisation. This addresses a common pain point: the disconnect between business users and data platforms.
With new features like Business Metrics, organisations can establish consistent business definitions for metrics and calculations across all use cases – from BI dashboards to AI models. This aims to eliminate discrepancies and speed up decision-making.
The user-friendly UI, natural language querying, and automated quality monitoring (e.g., flagging stale tables with a “freshness” indicator) further empower business users, making data governance intuitive and accessible. For Australian organisations struggling with data silos and inconsistent metrics, Unity Catalog offers a robust solution for building trust in data and fostering a truly data-driven culture.
Simplifying ingestion via Lakeflow and introducing Low Code/No-Code ETL via Lakeflow Designer: Democratising ETL with No-Code Simplicity
The final major announcement of the second day focused on simplifying the process of data ingestion, and removing current pain points of getting data into the Databricks platform in a usable format. There were several announcements in this place, and more broadly, a view that Databricks is expanding its reach into ingestion, through orchestration and modelling, and out to visualisations and end-user interactions. As a strategy, this may start to rub a few of their vendor partners (such as dbt, Fivetran and Qlik/Talend) the wrong way. A few notable announcements stood out.
Firstly, Databricks is open-sourcing dlt (Spark Declarative Pipelines), dispelling concerns about vendor lock-in. This is aligned with their organisational strategy of open source and open formats, but does formally declare their position of continuing to compete against dbt (interestingly, during the Snowflake summit the week before, one of the key announcements was that dbt will be natively embedded in Snowflake).
Secondly, significant improvements in real-time data processing will enable a wider range of low-latency use cases. Real-time mode was announced in Lakeflow Pipelines, which dramatically reduces the latency traditionally associated with micro-batch processing.
Thirdly, Databricks has also expanded the number of available connectors in Lakeflow Connect to cover sources like SQL server, ServiceNow and Google Analytics, which raises some questions around existing tooling providers such as Fivetran. It will be interesting to see what happens in this space, as they are clearly targeting all the major data sources and leaving the smaller, niche ones for Fivetran and other vendors.
Lastly, the unveiling of Lakeflow Designer – a no-code, natural language tool – promises to democratise ETL for everyone. Lakeflow Designer allows users to intuitively combine data from multiple sources with minimal or no coding. The underlying code still exists but is abstracted, making data manipulation accessible to business users who previously relied on data engineers.
The demo showcased the tool’s ability to interpret natural language commands and even identify potential issues in data manipulation through an AI assistant. An impressive feature was the ability to upload an example of a desired output table (e.g. a screenshot of a sales conversion table), and Lakeflow Designer would automatically transform the input data to match that structure. This functionality simplifies complex data transformations, making sophisticated ETL processes available to a much broader audience.
For businesses, Lakeflow Designer offers the potential to accelerate data preparation and democratise access to insights, reducing bottlenecks and empowering more individuals to work directly with data.
Changing the conversation on data formats: Iceberg Support in Unity Catalog
Databricks now supports Apache Iceberg alongside Delta Lake, letting customers create, manage, and access Managed Iceberg Tables from Databricks or external engines via the Iceberg REST Catalog API. The on stage presentation jokingly mentioned that Iceberg tables created and managed by Snowflake are actually faster when accessed through Unity Catalog.
Unity Catalog now uniquely supports reads and writes from both Databricks and external engines for managed Iceberg tables, with added auto-optimisation and speed added through the use of Predictive Optimization. A demo was shown using the Iceberg REST Catalog API through DuckDB for reads and writes, which was impressive. This really closes the gap between the two major open formats and simplifies the lives of users often working across both.
An exciting year ahead for Databricks
Overall, the Databricks Summit 2025 showcased a maturing platform intent on expanding its relevance across the full data and AI lifecycle—from ingestion and governance to application development and agent orchestration. While some announcements, such as Lakebase and Agent Bricks, hint at bold bets on the future, others like Unity Catalog enhancements and no-code ETL tools respond to more immediate enterprise needs.
We see clear benefits for Australian organisations looking to modernise their data capabilities—but we also recognise that Databricks’ growing product breadth may raise questions around ecosystem overlap and implementation complexity. We’re looking forward to seeing some of these big bets play out as these features are released later in the year.