[{"data":1,"prerenderedAt":206},["ShallowReactive",2],{"DlFXI4Eibt_Bn9lrEZz1TYbHCWFZj3IvqwHQSEW-Exc":3,"5WuCd19kG6sL-DiXjRS17vCQSSyeatQp2-N092xhdyM":194},{"code":4,"msg":5,"data":6},0,"",{"category":7,"tag":11,"hot":39,"new":78,"banner":118,"data":143,"cache":193},[8,9,10],"Agent","OpenAI","LLM",[12,14,17,20,23,25,27,30,33,36],{"title":8,"total":13},39,{"title":15,"total":16},"Google",44,{"title":18,"total":19},"Nvidia",13,{"title":21,"total":22},"Claude",11,{"title":9,"total":24},35,{"title":10,"total":26},85,{"title":28,"total":29},"DeepSeek",9,{"title":31,"total":32},"OCR",1,{"title":34,"total":35},"Chat",7,{"title":37,"total":38},"Generator",116,[40,48,55,64,71],{"id":41,"publish_date":42,"is_original":4,"collection":5,"cover_url":43,"cover_url_1_1":44,"title":45,"summary":46,"author":47},557,"2022-04-29","article_res/cover/7a9b1375ed9bb298154981bae42b794d.jpeg","article_res/cover/afa281dd52bc0454e6735daa8e6b0706.jpeg","Translation and summary of Messari Report [2.8 Kristin Smith, Blockchain Association and Katie Haun, a16z]","We need unity and speed right now.","Translation",{"id":49,"publish_date":50,"is_original":4,"collection":5,"cover_url":51,"cover_url_1_1":52,"title":53,"summary":54,"author":47},531,"2022-05-25","article_res/cover/e8362057f8fa189594c60afdfaaeb6e5.jpeg","article_res/cover/8ea08d0d6fa7eee6b57ed4ec61b61ad6.jpeg","Decentralized Society: Finding Web3’s Soul / Decentralized Society: Finding the Soul of Web3 -7","Decentralization through Pluralism When analyzing ecosystems, it's desirable to measure how decentralized it is.",{"id":56,"publish_date":57,"is_original":32,"collection":58,"cover_url":59,"cover_url_1_1":60,"title":61,"summary":62,"author":63},127,"2024-11-14","#Google #AI Game #World Model #AI Story","article_res/cover/0233a875b7ec2debf59779e311547569.jpeg","article_res/cover/6ffddb6ae4914b3c699493311aa9f198.jpeg","Google Launches \"Unbounded\": A Generative Infinite Character Life Simulation Game","Unbounded: A Generative Infinite Game of Character Life Simulation","Renee's Entrepreneurial Journey",{"id":13,"publish_date":65,"is_original":32,"collection":66,"cover_url":67,"cover_url_1_1":68,"title":69,"summary":70,"author":63},"2025-02-14","#Deep Dive into LLMs #Andrej Karpathy #LLM #Tool Use #Hallucination","article_res/cover/11e858ad6b74dfa80f923d549b62855c.jpeg","article_res/cover/615e1b320f1fc163edc1d2d154a6de33.jpeg","Andrej Karpathy's in-depth explanation of LLM (Part 4): Hallucinations","hallucinations, tool use, knowledge/working memory",{"id":72,"publish_date":73,"is_original":4,"collection":5,"cover_url":74,"cover_url_1_1":75,"title":76,"summary":77,"author":47},579,"2022-04-07","article_res/cover/39387376ba28447af1eb40576b9df215.jpeg","article_res/cover/02727ede8551ed49901d0abe6d6305b7.jpeg","Messari Report Translation and Summary 【1-7 Surviving the Winter】","I’d be more cautious here: 10 year and 10 hour thinking only.",[79,87,95,103,111],{"id":80,"publish_date":81,"is_original":32,"collection":82,"cover_url":83,"cover_url_1_1":84,"title":85,"summary":86,"author":63},627,"2025-03-20","#AI Avatar #AI Video Generation","article_res/cover/d95481358f73924989f8c4ee9c75d1c8.jpeg","article_res/cover/b74bc0fab01f8b6a6aa87696c0c3ed8b.jpeg","DisPose: Generating Animated Videos by Driving Video with Reference Images","DisPose is a controllable human image animation method that enhances video generation.",{"id":88,"publish_date":89,"is_original":32,"collection":90,"cover_url":91,"cover_url_1_1":92,"title":93,"summary":94,"author":63},626,"2025-03-21","#Deep Dive into LLMs #LLM #RL #Andrej Karpathy #AlphaGo","article_res/cover/446553a5c8f8f2f07d97b20eaee84e56.jpeg","article_res/cover/e6c2823409c9b34624064b9acbaca6f1.jpeg","AlphaGo and the Power of Reinforcement Learning - Andrej Karpathy's Deep Dive on LLMs (Part 9)","Simply learning from humans will never surpass human capabilities.",{"id":96,"publish_date":97,"is_original":32,"collection":98,"cover_url":99,"cover_url_1_1":100,"title":101,"summary":102,"author":63},625,"2025-03-22","#Deep Dive into LLMs #LLM #RL #RLHF #Andrej Karpathy","article_res/cover/8da81d38b1e5cf558a164710fd8a5389.jpeg","article_res/cover/96f028d76c362a99a0dd56389e8f7a9b.jpeg","Reinforcement Learning from Human Feedback (RLHF) - Andrej Karpathy's Deep Dive on LLMs (Part 10)","Fine-Tuning Language Models from Human Preferences",{"id":104,"publish_date":105,"is_original":32,"collection":106,"cover_url":107,"cover_url_1_1":108,"title":109,"summary":110,"author":63},624,"2025-03-23","#Deep Dive into LLMs #LLM #Andrej Karpathy #AI Agent #MMM","article_res/cover/a5e7c3d48bb09109684d6513287c661d.jpeg","article_res/cover/d3f22b7c0ab8d82fd2da457a299e0773.jpeg","The Future of Large Language Models - Andrej Karpathy's In-Depth Explanation of LLM (Part 11)","preview of things to come",{"id":112,"publish_date":105,"is_original":32,"collection":113,"cover_url":114,"cover_url_1_1":115,"title":116,"summary":117,"author":63},623,"#Google #Voe #AI Video Generation","article_res/cover/c44062fea0f336c2b96b3928292392c2.jpeg","article_res/cover/a041041c69092ad3db191c5bf3ff981b.jpeg","Trial of Google's video generation model VOE2","Our state-of-the-art video generation model",[119,127,135],{"id":120,"publish_date":121,"is_original":32,"collection":122,"cover_url":123,"cover_url_1_1":124,"title":125,"summary":126,"author":63},160,"2024-10-04","#Philosophy","article_res/cover/496990c49211e8b7f996b7d39c18168e.jpeg","article_res/cover/14dbaa1ade9cb4316d5829423a900362.jpeg","Time","The fungus of the morning does not know the waxing and waning of the moon, and the cicada does not know the seasons; this is a short life. To the south of the state of Chu there is a dark spirit which regards five hundred years as spring and five hundred years as autumn. In ancient times there was a great tree called the Ming which regarded eight thousand years as spring and eight thousand years as autumn; this is a long life.",{"id":128,"publish_date":129,"is_original":32,"collection":130,"cover_url":131,"cover_url_1_1":132,"title":133,"summary":134,"author":63},98,"2024-12-17","#AI Video Generator #Sora #Pika","article_res/cover/3b86e85d03fff4f356a3e4cf2bb329c9.jpeg","article_res/cover/5fa5c20ad0b40f8f544d257c0ef02938.jpeg","Pika 2.0 video generation officially released: effect comparison with Sora","今天，我们推出了Pika 2.0模型。卓越的文字对齐效果。惊人的视觉表现。还有✨场景成分✨",{"id":136,"publish_date":137,"is_original":32,"collection":138,"cover_url":139,"cover_url_1_1":140,"title":141,"summary":142,"author":63},71,"2025-01-14","#Nvidia #World Foundation Model #Cosmos #Physical AI #Embodied AI","article_res/cover/feddf8c832dfb45d28804291f6a42a9e.jpeg","article_res/cover/d6bc2f1186d96b78228c2283a17a3645.jpeg","NVIDIA's Cosmos World Model","Cosmos World Foundation Model Platform for Physical AI",[144,163,188],{"title":8,"items":145},[146,147,155],{"id":104,"publish_date":105,"is_original":32,"collection":106,"cover_url":107,"cover_url_1_1":108,"title":109,"summary":110,"author":63},{"id":148,"publish_date":149,"is_original":32,"collection":150,"cover_url":151,"cover_url_1_1":152,"title":153,"summary":154,"author":63},622,"2025-03-24","#OWL #AI Agent #MAS #MCP #CUA","article_res/cover/cb50ca7f2bf4d1ed50202d7406e1c19a.jpeg","article_res/cover/4aa7aa3badfacf3cc84121334f1050dd.jpeg","OWL: Multi-agent collaboration","OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation",{"id":156,"publish_date":157,"is_original":32,"collection":158,"cover_url":159,"cover_url_1_1":160,"title":161,"summary":162,"author":63},620,"2025-03-26","#LLM #Google #Gemini #AI Agent","article_res/cover/53751a6dbbe990b1eb0b63f3b062aed4.jpeg","article_res/cover/031344981f0a212ff82d1f3a64aa5756.jpeg","Gemini 2.5 Pro, claimed to be far ahead of the competition, has been released with great fanfare: comprehensively surpassing other LLMs and topping the global rankings","Gemini 2.5: Our most intelligent AI model",{"title":9,"items":164},[165,172,180],{"id":166,"publish_date":157,"is_original":32,"collection":167,"cover_url":168,"cover_url_1_1":169,"title":170,"summary":171,"author":63},619,"#OpenAI #AI Image Generator #4o #MMM #AR Transformer","article_res/cover/2faffc97fcecf3151552cb0fd3206d89.jpeg","article_res/cover/1133cb4948af44cee2e7fbe79efb69e5.jpeg","The native image function of GPT-4o is officially launched","Introducing 4o Image Generation",{"id":173,"publish_date":174,"is_original":4,"collection":175,"cover_url":176,"cover_url_1_1":177,"title":178,"summary":179,"author":63},434,"2023-07-15","#Anthropic #OpenAI #Google #AI Code Generator #Claude","article_res/cover/e1b6f600a2b9f262a4392684e5f2ce25.jpeg","article_res/cover/6e1772e83f78f9a351ab23d3e414adee.jpeg","Latest Updates on Google Bard /Anthropic Claude2 / ChatGPT Code Interpreter","We want our models to use their programming skills to provide more natural interfaces to the basic functions of our computers.  \n - OpenAI",{"id":181,"publish_date":182,"is_original":4,"collection":183,"cover_url":184,"cover_url_1_1":185,"title":186,"summary":187,"author":63},417,"2023-08-24","#OpenAI","article_res/cover/bccf897d50a88b18364e35f7466387e0.jpeg","article_res/cover/2f871085c1073717c1703ae86e18056f.jpeg","The GPT-3.5 Turbo fine-tuning (fine-tuning function) has been released～","Developers can now bring their own data to customize GPT-3.5 Turbo for their use cases.",{"title":10,"items":189},[190,191,192],{"id":88,"publish_date":89,"is_original":32,"collection":90,"cover_url":91,"cover_url_1_1":92,"title":93,"summary":94,"author":63},{"id":96,"publish_date":97,"is_original":32,"collection":98,"cover_url":99,"cover_url_1_1":100,"title":101,"summary":102,"author":63},{"id":104,"publish_date":105,"is_original":32,"collection":106,"cover_url":107,"cover_url_1_1":108,"title":109,"summary":110,"author":63},true,{"code":4,"msg":5,"data":195},{"id":196,"publish_date":197,"is_original":4,"collection":198,"articles_id":199,"cover_url":200,"cover_url_1_1":201,"title":202,"summary":203,"author":204,"content":205},441,"2023-06-23","#LLM","0AOf_Wp-UiCu6x65HYGPCg","article_res/cover/481d856c8920a31c49b7e605bccc2cd0.jpeg","article_res/cover/43db2aa890ed9cf38a6ea52ee1969e8c.jpeg","Large Language Model (LLM) Application Architecture","Large language models are powerful new primitives for building software, however it’s not always obvious how to use them.","a16z","\u003Cdiv class=\"rich_media_content js_underline_content\n                       autoTypeSetting24psection\n            \" id=\"js_content\">\u003Csection data-tool=\"mdnice编辑器\" data-website=\"https://www.mdnice.com\" style='font-size: 16px;color: black;padding: 0px 10px;line-height: 1.6;word-spacing: 0px;letter-spacing: 0px;word-break: break-word;overflow-wrap: break-word;text-align: left;font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, \"PingFang SC\", Cambria, Cochin, Georgia, Times, \"Times New Roman\", serif;'>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">A16z published an article this week focusing on the application architecture of large language models (LLMs): https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">This article provides a detailed analysis of the systems, tools, and design patterns commonly used by startups and large companies when designing and implementing LLM applications. As a non-technical reader, I attempted to understand the architectural design thinking behind these applications:\u003C/p>\u003Cp style=\"text-align: center;\">\u003Cimg class=\"rich_pages wxw-img\" data-galleryid=\"\" data-ratio=\"0.7\" data-s=\"300,640\" data-type=\"webp\" data-w=\"1080\" style=\"\" src=\"https://res.cooltool.vip/article_res/assets/17434960186580.9013830194326207.jpeg\">\u003C/p>\u003Cp>\u003Cbr>\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">The figure above is mainly based on the \"in-context learning\" design pattern.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">So, first we need to understand what \"in-context learning\" is. You can refer to the relevant entry on Wikipedia: https://en.wikipedia.org/wiki/In-context_learning_(natural_language_processing)\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">mainly describes how the model predicts or generates subsequent text content based on the given textual context. In other words, the model uses information from the preceding text to produce relevant and appropriate responses. This method mimics human conversational habits: we always base our thoughts and decisions on the context of the conversation to decide what we will say next.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">For example, suppose I ask, \"How's the weather today?\" You might answer based on your location and time, such as \"It's sunny today\" or \"It's raining today.\" This is using the context (current weather conditions) to respond to my question.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">Therefore, \"In-context learning\" in this context refers to the ability of language models to generate appropriate responses based on the given conversational context.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">There are mainly three ways to use large language models (LLMs):\u003C/p>\u003Col data-tool=\"mdnice编辑器\" style=\"margin-top: 8px;margin-bottom: 8px;padding-left: 25px;color: black;list-style-type: decimal;\" class=\"list-paddingleft-1\">\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">Training your own model from scratch\u003C/section>\u003C/li>\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">Fine-tuning based on open-source models\u003C/section>\u003C/li>\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">Directly using APIs\u003C/section>\u003C/li>\u003C/ol>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">Among them, in-context learning mainly applies to the third approach. Although directly using APIs is more convenient compared to training models from scratch or fine-tuning models, the cost of API calls increases exponentially with the length of the prompt, so it becomes very important to use in-context learning methods more efficiently. You can refer to another article by a16z for deeper insights: https://a16z.com/2023/05/25/ai-canon/\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">This article breaks down the entire workflow into three main steps:\u003C/p>\u003Col data-tool=\"mdnice编辑器\" style=\"margin-top: 8px;margin-bottom: 8px;padding-left: 25px;color: black;list-style-type: decimal;\" class=\"list-paddingleft-1\">\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">: In this stage, private data needs to be processed and stored (using legal documents as an example) for subsequent retrieval. This usually involves splitting documents into smaller chunks, processing them through embedding models, and then storing the processed data in a special database called a vector database. I have explained this part in detail in one of my previous articles:\u003C/section>\u003C/li>\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">), and relevant documents retrieved from the vector database.\u003C/section>\u003C/li>\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">Prompt execution/inference: After the prompts are compiled, they are submitted to a pre-trained LLM (large language model) for processing, which includes using proprietary model APIs and open-source or self-trained models. Some developers may also add operational systems at this stage, such as logging, caching, and validation functions. I did not cover this part in my previous articles, but I can write a dedicated piece introducing this aspect later.\u003C/section>\u003C/li>\u003C/ol>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">\u003Cbr>\u003C/section>\u003Csection data-tool=\"mdnice编辑器\" data-website=\"https://www.mdnice.com\" style=\"font-size: 16px;color: black;padding: 0 10px;line-height: 1.6;word-spacing: 0px;letter-spacing: 0px;word-break: break-word;word-wrap: break-word;text-align: left;font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, 'PingFang SC', Cambria, Cochin, Georgia, Times, 'Times New Roman', serif;\">\u003Chr data-tool=\"mdnice编辑器\" style=\"height: 1px;margin: 10px 0px;border-right: none;border-bottom: none;border-left: none;border-top: 1px solid black;\">\u003C/section>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">Let's analyze these three steps in detail respectively:\u003C/p>\u003Ch2 data-tool=\"mdnice编辑器\" style=\"margin-top: 30px;margin-bottom: 15px;padding: 0px;font-weight: bold;color: black;font-size: 22px;\">Data preprocessing/embedding\u003C/h2>\u003Cp style=\"text-align: center;\">\u003Cimg class=\"rich_pages wxw-img\" data-galleryid=\"\" data-ratio=\"0.7\" data-s=\"300,640\" data-type=\"png\" data-w=\"1080\" style=\"\" src=\"https://res.cooltool.vip/article_res/assets/17434960187720.5403403310092465.png\">\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">may include various formats of documents, such as PDFs, CSVs, or SQL structured data. There are many ways to process and transform these data; some people prefer to use ETL (Extract, Transform, Load) tools like Databricks or Airflow, while others tend to use orchestration frameworks like LangChain and LlamaIndex.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">is a method that converts high-dimensional, discrete, or unordered data (such as words, sentences, users, products, etc.) into low-dimensional, continuous, ordered vectors. Many developers choose to directly use OpenAI API, such as the text-embedding-ada-002 model. Some large companies may opt for Cohere, while developers who prefer open source might choose Hugging Face's Sentence Transformers library. https://huggingface.co/sentence-transformers\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">is a database specifically designed for storing and processing vector data. It effectively stores embedding vectors and supports efficient querying of these embeddings. Many people choose to use Pinecone, while there are also open-source options such as Weaviate, Vespa, and Qdrant, as well as local vector management libraries like Chroma and Faiss, and enhanced OLTP systems like pgvector.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">usually refers to the amount or range of input data that the model can reference when generating predictions. For example, in processing text data, the context window may refer to the number of words surrounding the current word. For language models, a larger context window means the model can consider more historical information when generating predictions.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">Regarding data preprocessing and embedding, some believe that as the available context window of large models increases, Embeddings may be more integrated into prompts. However, experts hold the opposite view, arguing that as the context window expands, computational costs also increase, and using Embeddings can improve efficiency.\u003C/p>\u003Ch2 data-tool=\"mdnice编辑器\" style=\"margin-top: 30px;margin-bottom: 15px;padding: 0px;font-weight: bold;color: black;font-size: 22px;\">Prompt construction/retrieval\u003C/h2>\u003Cp style=\"text-align: center;\">\u003Cimg class=\"rich_pages wxw-img\" data-galleryid=\"\" data-ratio=\"0.7\" data-s=\"300,640\" data-type=\"png\" data-w=\"1080\" style=\"\" src=\"https://res.cooltool.vip/article_res/assets/17434960191960.6813505127423207.png\">\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">. In addition, there are more advanced prompt engineering techniques, such as Chain-of-Thought (CoT) and Tree of Thoughts (ToT), which can be referenced at https://www.promptingguide.ai/techniques. I will also provide a detailed analysis of these techniques later. These techniques can be used to build chatbots (ChatBot), perform document-based question answering, etc.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">In the previous step, we mentioned two orchestration frameworks, LangChain and LlamaIndex, which can play important roles in this step. They abstract many details related to prompt chains; define interfaces with external APIs (including determining when API calls are needed); retrieve contextual data from vector databases; and maintain memory across multiple LLM calls. They also provide templates for many common applications. Their output is one or a series of prompts that will be submitted to the language model. I have provided some basic knowledge about LangChain in past articles, and there will be more detailed sharing in the future.\u003C/p>\u003Ch2 data-tool=\"mdnice编辑器\" style=\"margin-top: 30px;margin-bottom: 15px;padding: 0px;font-weight: bold;color: black;font-size: 22px;\">Prompt execution/inference\u003C/h2>\u003Cp style=\"text-align: center;\">\u003Cimg class=\"rich_pages wxw-img\" data-galleryid=\"\" data-ratio=\"0.7\" data-s=\"300,640\" data-type=\"png\" data-w=\"1080\" style=\"\" src=\"https://res.cooltool.vip/article_res/assets/17434960187010.7967077848112312.png\">\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">Currently, OpenAI leads in the LLM field. Typically, people start building LLM applications with gpt-4 or gpt-4-32k models. However, when the product enters the scaling phase, they usually switch to the gpt-3.5-turbo model, which has lower accuracy but costs only one-fiftieth of gpt-4 and runs faster. Anthropic's Claude also provides an API, with a context window that can reach 100k. Some developers choose to use open-source models, and some cloud services like Databricks, Anyscale, Mosaic, Modal, and RunPod may offer corresponding preset tools and services. Hugging Face and Replicate also provide simple APIs and front-end interaction interfaces for running AI applications.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">Compared to proprietary models like OpenAI, open-source LLMs still have gaps, but these gaps are gradually narrowing. For example, Meta's LLaMa, as well as Together, Mosaic, Falcon, and Mistral. Of course, Meta is also preparing to open-source LLaMa2.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">usually refer to tools used for monitoring, managing, optimizing, or debugging model operations, which are not widely used among current developers. The following types of tools can all be considered \"operational tools\":\u003C/p>\u003Cul data-tool=\"mdnice编辑器\" style=\"margin-top: 8px;margin-bottom: 8px;padding-left: 25px;color: black;list-style-type: disc;\" class=\"list-paddingleft-1\">\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">, such as Weights, Biases, MLflow, PromptLayer, and Helicone, can record model outputs, inputs, and operating states, helping developers understand model behavior.\u003C/section>\u003C/li>\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">, such as Guardrails and Rebuff, can evaluate the model's speed and resource usage, helping developers optimize model performance.\u003C/section>\u003C/li>\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">, such as Redis, can store model outputs so they can be quickly retrieved when needed, thereby improving the response speed and cost-effectiveness of the application.\u003C/section>\u003C/li>\u003Cli>\u003Csection style=\"margin-top: 5px;margin-bottom: 5px;line-height: 26px;text-align: left;color: rgb(1,1,1);font-weight: 500;\">, can detect and prevent abuse or attacks on the model, protecting its security and stability.\u003C/section>\u003C/li>\u003C/ul>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">The non-model parts of LLM applications are usually hosted in the cloud, such as Vercel. Additionally, two new hosting methods are emerging: Steamship provides end-to-end hosting services for developers, including LangChain, multi-tenant data contexts, asynchronous tasks, vector storage, and key management; another direction is that companies like Anyscale and Modal allow developers to host models and Python code in the same place.\u003C/p>\u003Ch2 data-tool=\"mdnice编辑器\" style=\"margin-top: 30px;margin-bottom: 15px;padding: 0px;font-weight: bold;color: black;font-size: 22px;\">AI agent frameworks\u003C/h2>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">and I will demonstrate Reflexio to you in the future if there is an opportunity. However, most AI agent frameworks are currently in the proof-of-concept stage — although they can showcase stunning effects, they cannot reliably and reproducibly complete tasks yet. I will closely monitor their development in the future.\u003C/p>\u003Cp data-tool=\"mdnice编辑器\" style=\"font-size: 16px;padding-top: 8px;padding-bottom: 8px;margin: 0;line-height: 26px;color: black;\">creating products that were previously impossible to achieve. Regardless of which application, AI empowers individual developers to build astonishing results in just a few days, surpassing supervised machine learning projects that require large teams to spend months building. a16z's framework can help us better utilize this powerful tool.\u003C/p>\u003C/section>\u003Cp style=\"display: none;\">\u003Cmp-style-type data-value=\"3\">\u003C/mp-style-type>\u003C/p>\u003C/div>",1752585434806]