Curate MNIST¶
# !pip install -q 'lamindb[jupyter,aws]' torch torchvision lightning wandb
!lamin init --storage ./lamin-mlops
Show code cell output
! using anonymous user (to identify, call: lamin login)
→ initialized lamindb: anonymous/lamin-mlops
import lamindb as ln
from pathlib import Path
ln.context.uid = "EgmnhRJ5Hw1S0000"
ln.context.track()
Show code cell output
→ connected lamindb: anonymous/lamin-mlops
→ created Transform('EgmnhRJ5Hw1S0000'), started new Run('31Rl8uOK...') at 2025-01-20 07:34:18 UTC
→ notebook imports: lamindb==1.0.2 torchvision==0.20.1
Download the MNIST dataset and save it in LaminDB to keep track of the training data that is associated with our model.
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
dataset = MNIST(Path.cwd() / "download_mnist", download=True, transform=ToTensor())
Show code cell output
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw/train-images-idx3-ubyte.gz
0%| | 0.00/9.91M [00:00<?, ?B/s]
1%| | 98.3k/9.91M [00:00<00:12, 781kB/s]
4%|▎ | 360k/9.91M [00:00<00:06, 1.45MB/s]
15%|█▍ | 1.44M/9.91M [00:00<00:01, 4.34MB/s]
58%|█████▊ | 5.77M/9.91M [00:00<00:00, 14.9MB/s]
100%|██████████| 9.91M/9.91M [00:00<00:00, 15.8MB/s]
Extracting /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw/train-images-idx3-ubyte.gz to /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw/train-labels-idx1-ubyte.gz
0%| | 0.00/28.9k [00:00<?, ?B/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 416kB/s]
Extracting /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw/train-labels-idx1-ubyte.gz to /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
0%| | 0.00/1.65M [00:00<?, ?B/s]
6%|▌ | 98.3k/1.65M [00:00<00:02, 712kB/s]
22%|██▏ | 360k/1.65M [00:00<00:00, 1.40MB/s]
91%|█████████▏| 1.51M/1.65M [00:00<00:00, 4.52MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 3.94MB/s]
Extracting /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz
0%| | 0.00/4.54k [00:00<?, ?B/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 10.4MB/s]
Extracting /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to /home/runner/work/lamin-mlops/lamin-mlops/docs/download_mnist/MNIST/raw
# no need for the zipped files
!rm -r download_mnist/MNIST/raw/*.gz
!ls -r download_mnist/MNIST/raw
train-labels-idx1-ubyte t10k-labels-idx1-ubyte
train-images-idx3-ubyte t10k-images-idx3-ubyte
training_data_artifact = ln.Artifact(
"download_mnist/",
key="testdata/mnist",
type="dataset",
).save()
training_data_artifact
Show code cell output
! `type` will be removed soon, please use `kind`
Artifact(uid='wkIiXOIhbZI0RaLK0000', is_latest=True, key='testdata/mnist', suffix='', kind='dataset', size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', n_files=4, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-01-20 07:34:26 UTC)
After saving the MNIST training dataset in LaminDB, one can see the dataset showing up in LaminHub:
# save your notebook
# ln.context.finish()