" " Improving language models by retrieving from trillions of tokens – Web 3 News Hubb " "
Web 3 News Hubb
  • Home
  • Edge Computing
  • Artificial Intelligence
  • Blockchain
  • Contact
No Result
View All Result
Web 3 News Hubb
  • Home
  • Edge Computing
  • Artificial Intelligence
  • Blockchain
  • Contact
No Result
View All Result
Web 3 News Hubb
No Result
View All Result
Home Artificial Intelligence

Improving language models by retrieving from trillions of tokens

admin by admin
August 8, 2023
in Artificial Intelligence


In recent years, significant performance gains in autoregressive language modeling have been achieved by increasing the number of parameters in Transformer models. This has led to a tremendous increase in training energy cost and resulted in a generation of dense “Large Language Models” (LLMs) with 100+ billion parameters. Simultaneously, large datasets containing trillions of words have been collected to facilitate the training of these LLMs.

We explore an alternate path for improving language models: we augment transformers with retrieval over a database of text passages including web pages, books, news and code. We call our method RETRO, for “Retrieval Enhanced TRansfOrmers”.

Figure 1: A high-level overview of Retrieval Enhanced TransfOrmers (RETRO).

In traditional transformer language models, the benefits of model size and data size are linked: as long as the dataset is large enough, language modeling performance is limited by the size of the model. However, with RETRO the model is not limited to the data seen during training– it has access to the entire training dataset through the retrieval mechanism. This results in significant performance gains compared to a standard Transformer with the same number of parameters. We show that language modeling improves continuously as we increase the size of the retrieval database, at least up to 2 trillion tokens – 175 full lifetimes of continuous reading.

Figure 2: Increasing the size of the retrieval dataset results in large gains in model performance.

For each text passage (approximately a paragraph of a document), a nearest-neighbor search is performed which returns similar sequences found in the training database, and their continuation. These sequences help predict the continuation of the input text. The RETRO architecture interleaves regular self-attention at a document level and cross-attention with retrieved neighbors at a finer passage level. This results in both more accurate and more factual continuations.  Furthermore, RETRO increases the interpretability of model predictions, and provides a route for direct interventions through the retrieval database to improve the safety of text continuation. In our experiments on the Pile, a standard language modeling benchmark, a 7.5 billion parameter RETRO model outperforms the 175 billion parameter Jurassic-1 on 10 out of 16 datasets and outperforms the 280B Gopher on 9 out of 16 datasets.

Below, we show two samples from our 7B baseline model and from our 7.5B RETRO model model that highlight how RETRO’s samples are more factual and stay more on topic than the baseline sample.

Figure 3: The baseline only generates 2 correct digits. With RETRO, the correct digits are generated after being retrieved by the database.
Figure 4: The RETRO model stays more on-topic than the baseline sample.Type image caption here (optional)



Source link

Previous Post

Step-By-Step LLM Product Development For Business Leaders

Next Post

Sustainability 101: What are virgin plastics?

Next Post

Sustainability 101: What are virgin plastics?

  • Ethereum Node and Client Comparisons

    0 shares
    Share 0 Tweet 0
  • ChatGPT: The Technicalities behind the Rising Star of Conversational AI | by ximnet | Mar, 2023

    0 shares
    Share 0 Tweet 0
  • The Crucial Role of Network Integration in Large Enterprises

    0 shares
    Share 0 Tweet 0
  • Xsolla and Crypto.com Partner to Integrate Payment Solutions

    0 shares
    Share 0 Tweet 0
  • How to Create a Healthcare Chatbot Using NLP | by Devashish Datt Mamgain | Mar, 2023

    0 shares
    Share 0 Tweet 0

© Web3 News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Edge Computing
  • Artificial Intelligence
  • Blockchain
  • Contact

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Edge Computing
  • Artificial Intelligence
  • Blockchain
  • Contact

© 2022 Web 3 News Hubb All rights reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In