
Lilac - Better data, better AI
“Lilac is an incredibly powerful tool for data exploration and quality control. We use Lilac daily to inspect and evaluate datasets, and then democratize them across the org. It is a critical part of our data quality evaluation pipeline.”
Lilac
Better data, better AI. Lilac is an open-source tool that enables data and AI practitioners improve their products by improving their data. Try Lilac on HuggingFace Spaces, where we’ve preloaded popular datasets like OpenOrca. Try a semantic search for “As a language model” on the OpenOrca dataset! Why use Lilac?#
Quick Start - Lilac
Lilac’s sweet spot is ~100K-1M rows of data, although up to 10 million rows are possible. This quickstart uses 10,000 rows so that clustering and embedding operations finish locally in 10-20 minutes even without a GPU. Lilac Garden can help speed up …
Introducing Lilac - Lilac
Aug 21, 2023 · Create and refine Lilac Concepts which are customizable AI models that can be used to find and score text that matches a concept you may have in your mind. Download the results of the enrichment for downstream applications.
Joining Databricks - Lilac
Mar 19, 2024 · We believe that bringing the real-time, interactive data curation experience of Lilac to Databricks’ enterprise-scale platform will enable businesses to have much more visibility and control over their unstructured data. This will enable world-class, customizable AI products that serve end-users.
About - Lilac
Lilac is excited to announce that we are joining Learn more . Our Team. Daniel Smilkov. Co-Founder & CEO. Co-led TensorFlow.js and Know Your Data at PAIR, with focus on ML and visualization. ... better data, better AI. Contact Us. [email protected]. Quick Links. Docs. Follow Us. bottom of page ...
Curate a coding dataset with Lilac - Lilac
Dec 7, 2023 · At Lilac, we also believe that having more eyes on data ultimately leads to fundamental discoveries of how a model will behave, giving the developer more control of their downstream AI product. In this blog post, we’ll delve into the excellent Glaive coding assistant dataset with the goal of fine-tuning a code assistant model.
Cluster a dataset - Lilac
Lilac Garden uses powerful GPUs to embed, cluster and annotate datasets up to 100x faster than on device. We can cluster a million documents in ~20-30 mins. From the UI #
100x Faster Clustering with Lilac Garden
Jan 30, 2024 · For datasets larger than 10k rows – or if you’re impatient, like us – Lilac Garden is a remote computation service that powers compute-heavy features like clustering, perplexity scoring, and embedding computation. Lilac Garden clustered our largest datasets of 4 million data points in just an hour.
Explore a dataset (UI) - Lilac
Introducing Lilac; Curate a coding dataset with Lilac; 100x Faster Clustering with Lilac Garden; Joining Databricks; Getting Started. Quick Start; Installation; Python API; Datasets. Load a dataset; Explore a dataset (UI) Configure a dataset; Cluster a dataset; Deleting rows; Edit a dataset; Label a dataset; Compute or load embeddings; Apply a ...