@itsandrewgao Yes, I did this on Falcon7B (who?) last summer. Happens if your training data is tiny.
@FlipFloopDev But can you overfit through pretraining?
@itsandrewgao @FlipFloopDev Why wouldn't you? Of course you can. It wouldn't be able to generalize outside of it's pre-training data.
@essobi @itsandrewgao @FlipFloopDev Point is nobody does, training data size such that most large models don't get through an entire epoch (or very few)