Christian Szegedy @ChrSzegedy, Twitter Profile

Christian Szegedy @ChrSzegedy

2 years ago

github.com/google-researc… (based on JAX/Linen) allows you to play with memorizing and block-recurrent transformers. Unlike other code bases, it is based on transformer-XL and lets you train on long documents using sliding-window attention.

Erik Nijkamp @erik_nijkamp

2 years ago

0 0 2 0 0

1 3 31 0 6

Erik Nijkamp @erik_nijkamp

2 years ago

@ChrSzegedy Wonderful, thank you for the pointer and the amazing research!

1 0 2 0 0

Christian Szegedy @ChrSzegedy

2 years ago

@erik_nijkamp This code is mostly developed by Delesley Hutchins with contributions from @MarkusNRabe and @Yuhu_ai_ .

1 0 2 0 0

Erik Nijkamp @erik_nijkamp

2 years ago

@ChrSzegedy @MarkusNRabe @Yuhu_ai_ This is nice code, thanks again! Two rather silly questions: (1) The implementation seems to rely on pmap() SPMD limited to 8 TPU cores. I guess your 8B models or training code won't be released? (2) Besides ppl experiments, have you tried few-shot with memory on some benchmark?

1 0 0 0 0

Christian Szegedy @ChrSzegedy

2 years ago

@erik_nijkamp @MarkusNRabe @Yuhu_ai_ It is true that this code base can accomodate only models of limited size, but there are simple patches to fix that. We have not tried few-shot prompting on the memory.

1 0 3 0 0

Erik Nijkamp @erik_nijkamp

2 years ago

@ChrSzegedy @MarkusNRabe @Yuhu_ai_ We are playing with the Meliad and your models, thanks again! Probably very underspecified, but would you have any high-level thoughts (or practical/empirical findings) on memorizing transformer, block-recurrent, S4, perceiver-AR, RETRO?

1 0 3 0 0