문제 정의

daily./연구실 TIL

문제 정의

브라이티_ 2023. 7. 15. 12:38

내가 무엇을 해야하는지 정확히 정리해보자.

https://llm-efficiency-challenge.github.io/challenge

NeurIPS Large Language Model Efficiency Challenge:1 LLM + 1GPU + 1Day

NeurIPS 2023 Challenge

llm-efficiency-challenge.github.io

1. 문제

1개의 Large Language (base) Model 과 Open-source Dataset 그리고 1개의 GPU를 활용해 1일(24시간) 내에 모델을 Fine-tuning 하여, 모델이 주어지는 Hold-out Task 들을 우수한 성능으로 수행하게 하는 것이다.

2. 문제의 제한 조건.

2-1. 베이스라인 모델의 종류

The starting model for the competition should be an open MIT or Apache 2) base model without instruction-tuning. All sizes of the common autoregressive and autoencoder base models listed below are allowed.

Falcon
LLaMA (as long as you’re not using pirated weights)
OpenLLaMA
Red Pajama Base (not instruction tuned models)
MPT
OPT
Bloom
GPT Neo, J, NeoX, Pythia
GPT2
T5 (not Flan-T5)
BART
DeBERTa
RoBERTa
BERT
ALBERT
DistilBERT
Electra
UL2

큰 파인튜닝 아이디어를 설계한 후, 각각의 모델에 대입해보면서 성능을 비교하여야 할 것 같다. ~~(최근 가장 핫한 LLaMA 모델도 꼭 활용해보고 싶어 페이스북에 연구용으로 weights를 신청한 상태인데, 아직까지 답이 없어서 조금 걱정이 된다.)~~

2-2. GPU의 종류

NVIDIA 4090
NVIDIA A100 (maybe 40GB)

3. 태스크(Task)

The evaluation process in our competition will be conducted in two stages. In the first stage, we will run a subset of HELM benchmark along with a set of secret holdout tasks. The holdout tasks will consist of logic reasoning type of multiple-choice Q&A scenarios as well as conversational chat tasks. Submissions will be ranked based on their performance across all tasks. The ranking will be determined by the geometric mean across all evaluation tasks. This score will be shown in the leaderboard.

secret...holdout...task라니...

전반적인 logic reasoning task 들을 모두 잘 수행하는 multi-tasking model 을 만들어야 하는건가...

일단은 (multiple-choice) Q&A 태스크를 중심으로파인튜닝 방향을 잡고 있다.

4. 기간

2023년 9월 30일까지 제출

5. 제출 형태

일단 모든 소스코드를 reproducible 한 형태로 제출하라는 것 같다.

Submissions must be reproducible from initial model through fine tuning and inference. Winning models, along with all associated code and data, must be open-sourced and made public after the competition under the MIT or Apache 2 license. Submissions must NOT use any copyrighted or proprietary data, code, or closed-source content. The use of data or content that breaks service contracts or trade secrets of any entity is not allowed.