Buidl in the Open #3 - Advancing Solidity Code Evaluation This week we delve into the evolution of our training data and the meticulous process behind our evaluation benchmarks, highlighting Spectral’s machine-learning team. 🙌 🔵Training Data Evolution: Version 1 We began with a dataset comprised of 30 million unique smart contracts, sourced from the @zellic_io Smart Contract Fiesta and all verified contracts on the Ethereum network post-March 2023, deployed with Solidity version 0.8.15 and above. This dataset facilitated the initial development of our Language Models (LLM), enhanced by Retrieval-Augmented Generation (RAG) and synthetic prompts detailing code usage for fine-tuning. The inclusion of top DeFi projects' documentation and code further enriched our dataset. Version 2 We significantly expanded our dataset to include over three times the number of unique contracts found in V1, totaling 124,778 unique contracts. By broadening our criteria to contracts with Solidity version 0.8.0 and above, we used a wider array of contracts for a more comprehensive dataset that better supports our RAG process. Version 3 Building upon the synthetic prompts from V1, we leveraged LLMs to generate more detailed prompts, including inline comments that explain the purpose, inputs, outputs, and other relevant details of the code. This enhancement has substantially improved our dataset, enriching our RAG pipeline and fine-tuning processes and advancing our efforts to refine our data and models. Each iteration marks a significant step towards achieving unparalleled efficiency in Solidity code evaluation. 🔵Evaluation Benchmarks and Process: Our evaluation framework is inspired by the HumanEval dataset, which comprises 164 programming problems designed to assess code generation models. Recognizing the limitations of applying this Python-centric dataset to evaluate Solidity code generation, we crafted a specialized dataset featuring over 80 handwritten Solidity problems. These problems are designed to test code correctness and gas optimization, providing a solid foundation for our evaluation system. Our mission is to develop a robust evaluation system for LLMs in the context of Solidity. This involves creating a diverse set of prompts for each problem, conducting both manual and AI-generated tests to ensure functionality and efficiency, and analyzing gas costs and vulnerabilities. The development of automated evaluation criteria is central to our project, ensuring a balanced and fair assessment of LLM performance. Through continuous testing and feedback, we are dedicated to refining our AI-driven Solidity evaluation process. Stay tuned for further updates as we continue to build in the open. 🦾
@Spectral_Labs Love the process, waiting for more updates. I’m super encouraged by what I see on @Spectral_Labs 👏🏻🔥
@Spectral_Labs Can't wait to see the continued progress and updates. Keep up the fantastic work, @Spectral_Labs! 🚀 Btw, happy to connect you to our network of KOLs and alpha groups that can help with your token sale. Always love to support growing projects like yours!
@Spectral_Labs Spectral’s progress in the past few weeks been phenomenal, keep up the good work 👏
@Spectral_Labs I've always loved this team;massive progress I see here!
@Spectral_Labs Impressive progress! Your dedication to refining Solidity code evaluation is evident. Looking forward to seeing the continued advancements and updates. Keep up the great work! 👏 #BuidlInTheOpen 🚀👏
@Spectral_Labs Fantastic insights into the evolution of Spectral's training data and the meticulous process behind their Solidity code evaluation benchmarks 👌🏾