Blog

2022 春招开始了

DADI 存储加速器团队介绍

阿里云存储加速器(DADI)团队负责加速各种存储服务的访问速度。随着分布式存储技术的发展,以及计算-存储分离架构的推广,各种应用普遍需要通过某种形式的加速技术来达成更高的数据访问性能,或者更低的存储成本。本团队的使命便在于此。

我们主要的技术手段包括(但不限于):通用文件缓存、基于P2P的热点分散、新型硬件的性能红利发觉、以及针对性定制优化等。团队的技术产出不但直接应用于集团生产系统支持各种业务,而且部分创新性成果还发表在国际顶级学术会议和期刊,例如ATC、EuroSys、TPDS、ToS、Infocom等。

团队目前主要的工作方向是容器平台、云原生系统的image技术演进,分布式cache产品以及p2p产品研发和输出。

团队工作不但确保了阿里云存储产品的性能指标,而且直接支撑阿里集团和阿里云业务系统的架构演进。

团队带头人

李慧霸,博士,阿里云资深技术专家,获得多项发明专利,研究工作发表在USENIX ATC、EuroSys、IEEE Infocom、IEEE ToS、IEEE TPDS、《中国科学》、《软件学报》等国内外技术会议和期刊,为IEEE TSC、IEEE TPDS、JPDC、IJCNN、ICC-NGN等顶级期刊和会议评阅稿件。

Read more

DADI:
Block-Level Image Service for Agile and Elastic Application Deployment

Authors

Huiba Li, Yifan Yuan, Rui Du, Kai Ma, Lanzheng Liu, and Windsor Hsu, Alibaba Group

Abstract

Businesses increasingly need agile and elastic computing infrastructure to respond quickly to real world situations. By offering efficient process-based virtualization and a layered image system, containers are designed to enable agile and elastic application deployment. However, creating or updating large container clusters is still slow due to the image downloading and unpacking process. In this paper, we present DADI Image Service, a block-level image service for increased agility and elasticity in deploying applications. DADI replaces the waterfall model of starting containers (downloading image, unpacking image, starting container) with fine-grained on-demand transfer of remote images, realizing instant start of containers. DADI optionally relies on a peer-to-peer architecture in large clusters to balance network traffic among all the participating hosts. DADI efficiently supports various kinds of runtimes including cgroups, QEMU, etc., further realizing ``build once, run anywhere’’. DADI has been deployed at scale in the production environment of Alibaba, serving one of the world’s largest ecommerce platforms. Performance results show that DADI can cold start 10,000 containers on 1,000 hosts within 4 seconds.

Read more

Accelerated Container Image

Overview

Accelerated Container Image is a sub-project of containerd. It is a core technology of Alibaba and was published at DADI: Block-Level Image Service for Agile and Elastic Application Deployment. USENIX ATC’20. Accelerated Container Image was opened at Mar 2011 and became a containerd sub-project at Nov 2011.

At the heart of the acceleration is overlaybd, which is a new remote image format based on block device. It is used for image acceleration by supporting fetching image data on-demand without downloading and unpacking the whole image before a container running. With overlaybd image format, we can cold start a container instantly.

Accelerated Container Image contains the containerd snapshotter and conversion tools for overlaybd images.

Overlaybd is the name of our image format, which is a block-device based image format. The overlaybd repository containes the storage backend of Accelerated Container Image, provides a merged view of a sequence of block-based layers as an block device.

Block-based image

The existing lazy pulling image formats are filesystem-based. However, implementing a POSIX-complaint file system interface and exposing it via the OS kernel is complex. And underlay filesystem support is restricted in current file-system-based image services.

Overlaybd provides a virtual block device for container image to support lazy-pulling image which has fully posix-compliant and multiple filesystem supported as a supplement.

Several features show as below:

  • High Performance

    It’s a block-device-based storage of OCI image, which has much lower complexity than filesystem-based implementations. For example, cross-layer hardlink and non-copy commands like chown are very complex for filesystem-based image without copying up, but is natively supported by overlaybd. Overlaybd outperforms filesystem-based solutions in performance.

  • High Reliability

    Overlaybd outputs virtual block devices through TCMU(TCM in userspace), which is widely used and supported in most operation systems. Overlaybd backstore can recover from failures or crashes, which is difficult for FUSE-based image formats.

  • Security

    Block-based solution has small attack surface.

  • Efficiency virtualization supported

    Passing a block-device from host to microVM (like kata-container) via virtio-blk is usually less performance cost. On the other hand, passing ‘rootfs’ from host to microVM via virtio-fs is also supported after overlaybd has been mounted.

  • Native Support for Writable

    Overlaybd can be used as a writable/container layer. The end-users can build their overlaybd images naturally without conversion.

  • Multiple File System Supported

    Overlaybd is independent of the underlay filesystem. It’s convenient for users to choose their ideal image filesystem, such as xfs, btrfs, zfs even NTFS, and makes it possible to run Windows container on Linux host, or vice versa.

Older posts