The increasing complexity of deep learning models necessitates specialized hardware and software optimizations, particularly for deep learning accelerators. Existing autotuning methods often suffer from prolonged tuning times due to profiling invalid configurations, which can cause runtime errors. We introduce ML^2Tuner, a multi-level machine learning tuning technique that enhances autotuning efficiency by incorporating a validity prediction model to filter out invalid configurations and an advanced performance prediction model utilizing hidden features from the compilation process. Experimental results on an extended VTA accelerator demonstrate that ML^2Tuner achieves equivalent performance improvements using only 12.3% of the samples required with a similar approach as TVM and reduces invalid profiling attempts by an average of 60.8%, Highlighting its potential to enhance autotuning performance by filtering out invalid configurations
@inproceedings{cha2024ml2tunerefficientcodetuning,title={{ML$^2$}Tuner: Efficient Code Tuning via Multi-Level Machine Learning Models},author={Cha, JooHyoung and Lee, Munyoung and Kwon, Jinse and Lee, Jubin and Lee, Jemin and Kwon, Yongin},eprint={2411.10764},archiveprefix={arXiv},primaryclass={cs.LG},booktitle={Machine Learning for Systems Workshop at NeurIPS},url={https://arxiv.org/abs/2411.10764},doi={10.48550/arXiv.2411.10764},year={2024},month=dec,pages={1--12}}
Deep learning has expanded its footprint across diverse domains. The performance of these computations hinges on the interplay between deep learning compilers and inference libraries. While compilers adapt efficiently to new deep learning operations or models, their tuning processes are too time-consuming. In con- trast, inference libraries offer quick execution but with adaptability limitations. To address these challenges, we propose ACLTuner, which optimizes execution configurations using existing inference library kernels. ACLTuner identifies and assigns the optimal kernel through targeted device profiling. Compared to ArmNN, AutoTVM, Ansor, ONNXRuntime, and TFLite, ACLTuner not only achieves up to 2.0x faster execution time across seven deep learning models, but also reduces the average tuning time by 95%.
@inproceedings{kwon2023acltuner,title={{ACLT}uner: A Profiling-Driven Fast Tuning to Optimized Deep Learning Inference},author={Kwon, Yongin and Cha, JooHyoung and Lee, Jubin and Yu, Misun and Park, Jeman and Lee, Jemin},booktitle={Machine Learning for Systems Workshop at NeurIPS},year={2023},month=dec,url={https://openreview.net/forum?id=k0FIPHpeR4},pages={1--12}}
KIICE W.
**Distinguished Paper** Design of Efficient Virtual Desktop Infrastructure based on Super Resolution and NPU
JooHyoung Cha, Hyunjun Park, Miseon Im, and 4 more authors
In Artificial Intelligence and Applied Workshop at KIICE, Sep 2023
Recently, there has been a growing demand to apply deep learning in embedded environments. In constrained embedded environments, heterogeneous multicore CPU architectures like Arm’s big.LITTLE are widely utilized to efficiently carry out deep learning computations. Although Arm provides Arm Compute Library (ACL) for optimal deep learning operations, it does not fully leverage the potential of hardwares with the big.LITTLE structure. This paper proposes a profile-based search method for automatically determining the optimal execution kernel and schedule for each hardware. Experiments were conducted on Tinker Edge R, Odroid N+, and Snapdragon 865 HDK boards using AlexNet, VGG16, MobileNetV2, and GoogleNet models. In all cases, the proposed method improved performance up to 266% compared to existing methods. Through the results of this research, we expect to enable cost-effective, low-power, and high-performance execution of deep learning in embedded devices.
@inproceedings{cha2022pgoacltuner,title={Profiling-based ArmCL Optimal Schedule Search for Single-ISA Heterogeneous Multi-Core Architectures},author={Cha, JooHyoung and Kwon, Yongin and Lee, Jemin},booktitle={Journal of The Institute of Electronics and Information Engineers},issue={7},publisher={IEIE},doi={10.5573/ieie.2023.60.7.40},volume={60},journal={Journal of The Institute of Electronics and Information Engineers, ISSN: 2287-5026},url={http://journal.auric.kr/ieie/ArticleDetail/RD_R/423544},year={2022},month=jul,pages={40--49}}
2019
ICFICE 2019
An Effective Method for Generating Color Images Using Genetic Algorithm
JooHyoung Cha, and Young Woon Woo
In 2019 INTERNATIONAL CONFERENCE ON FUTURE INFORMATION & COMMUNICATION ENGINEERING Vo.11 No.1, Jun 2019
In this paper, we proposed two methods to automatically generate color images similar to existing images using genetic algorithms. Experiments were performed on two different sizes(256x256, 512x512) of color images using each of the proposed methods. Experimental results show that evolving the whole image into sub-images evolves much more effective than modeling and evolving it into a single gene, and the generated images are much more sophisticated. Therefore, we could find that gene modeling should be carefully decided in order to generate an image similar to the existing image in the future, or to learn quickly and naturally to generate an image synthesized from different images.
@inproceedings{cha201906imagegenusingga,title={An Effective Method for Generating Color Images Using Genetic Algorithm},author={Cha, JooHyoung and Woo, Young Woon},booktitle={2019 INTERNATIONAL CONFERENCE ON FUTURE INFORMATION & COMMUNICATION ENGINEERING Vo.11 No.1},issue={1},volume={11},url={https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE08747485},year={2019},month=jun,pages={1--4}}