神经网络真的像大脑结构吗
- data: text sequence
- tokenization
- converts to numbers + byte pair encoding iteratively merge common pairs
- reduce vocab size (≈ 10w) & generalize better with subwords handling unknown ones
- transformer
- input: sequence tokens + parameters (≈ hundreds of billions / 几千亿)
- process them with a complex math function
- SFT
- next token probability distribution
- adjust weights so to produce more correct probabilities
- base model: generative, not conversional
- post-train
- prompt-response tokens to train for conversations
- copied data, rigid, limited
- RLHF
- separate reward model trained by ranking responses to learn preference
- scalable, consistent, adaptable
- hallucination
- simulation, not recall (also simulated confidence confidence) Memory
- solution: train it to admit ignorance for input it’s ignorant of