Deep Dive into LLMs like ChatGPT
2025-2-16
| 2025-2-18
0  |  Read Time 0 min
type
status
date
slug
summary
tags
category
icon
password
 
notion image
 

pre-training

 
download and preprocess the internet
 
 
tokenization
group 8 bits —>bytes. ( from 010101010—→124. ) (compress)—>like unique symbols/emojis
group bytes(pairs of bytes)—> into a new symbols
 
 
 
network training:
input:sequence of token
output:prob of next token(parallely happen in all whole dataset)
 
nn internals:
weights change
 
 
inference:
predict one token at a time
 
 
lower loss—>better network
 
 
release of a model:
github: sequence of the model (forward pass of a nn?)
parameters: a list of billions of parameters
 
gpt2, llama3
 
parameters —> zip file
 
 
in context learning( with few short prompt )
 
 
—> base model
interner document simulator
 
 
 

post training:supervised finetuning

conversations into token
(. use protocol or tool
)
 
database how to build up:
llm / human (scale ai?)
 
 
 
 
hallucination:—> let llm to say idk
1.take paragraph—>construct questions. (3 times compared with the correct answer)
 
—> add new QA into training set A: idk —>. let llm to say idk
 
2.
do some search
 
tool—> <search_start> xxx <search_end>. xxx —>search query
 
—>. training data
 
knowledge in the parameters== vague recollection
knowledge in the tokens of the context window== working memory
 
model of self
model itself—> if not providing examples of question like: ‘who are you’ —> the answer will be “chatgpt”.—>lots of data in the internet
 
 
model need tokens to think
let model slowly get the answer instead of giving out the answer directly.
notion image
right one is better
 
single forward pass—>not enough calculate
 
—> use code might be a better way? when asking a math question to the llm model
 
 
models are not good with spelling, they see tokens (text chunks),not indiviual letter
 
 
 
after that we have sft(supervised finetuning) model
 
 

post training:reinforcement learning

 
pretraing: background knowledge
supervised finetuning: worked problems. problem+answer for imitation
reinforcement learning:practical problems (try —> reach the correct answer)
 
 
deepseek r1—> rl model
 
 
rlhf
human feedback—> like writing jokes
—>better training set
 
problem: cost human’s time
 
method: human feedback —> train a nn simulator of human preferences(reward models)—> use reward model to give feedback
 
x use too much reward model
 
 
server sent events实现日志实时输出go 积累
Loading...
Catalog