Software and data culture for AI system integration



In Taiwan, the industry doesn’t value much on software engineering and data science. More companies are good at chip manufacturing and hardware production. However, in the coming AI trend, is Taiwanese companies ready for catching up the trend? Whether the mindset of top managers can be changed and managers keep learning will decide whether Taiwan can catch up the world trend. Essentially, AI is about software, it analyzes data.
Fig 1. Machine Learning V.S. Manufacturing ,圖片來源

資料科學(機器學習研究)是奠基在 ML Engineer 與 Data Engineer 之上,還不包含最普遍的 Software Engineer。Data Engineer是整個的基礎,負責把撒在外面的IoT devices、網路平台、App使用者操作紀錄給完整的存到一個Data Lake。台灣人工智慧學校最近所做的AI人才概況調查,資料品質不佳、取得不易是最常遇到的難題。之前Jason在eBay工作時所待的Analytics Platform部門有200多人在圍繞著維護Data平台,在處理資料、清資料、做ETL、A/B testing、開發ML模型…等。

Data Science (Machine Learning Research) stepped on ML Engineer and Data Engineer. It even not includes the most commonly seen Software Engineer. Data Engineer is the foundation, he takes care of collecting user behavior data from IoT devices, web platform, and App. The data is stored in Data Lake. Taiwan AI Academy recently published a survey. Data quality and hard-to-get data are the most seen difficulties. Jason stayed in Analytics Platform team while he was a software engineer. There were more than 200 people in that team, who are responsible for maintaining Data Platform. People were doing data collection, cleaning data, ETL jobs, A/B testing, ML model development,…etc.

sdc_pic2_ml_researcherFig 2. Machine Learning Skills Pyramid,圖片來源


Before introducing AI, why not thinking what “human intelligence” can do? Take the followings as examples, video surveillance, image object identification, audio recognition, and auto-driving. If we let human do those things, can we provide valuable service? If the value coming out from service can make quality improvement, we can start to automate the process through AI.

台灣每年大量的資訊科系畢業生卻都以投入硬體產業鏈居多,直到2018年國際科技公司如Google, Microsoft, Amazon, IBM, Line, Oath Yahoo開始招聘AI工程師,希望這能帶來一些變化。在AI這個新領域裡台灣有多年Software+Data經驗的人不多,所以剛投入AI的工程師很多時候還會花時間打滾摸索,跌跌撞撞,踩洞,管理者即使有AI意識,也聘用了AI engineer/researcher,在遇到進度緩慢、效果不彰、成果不如預期時,往往也不知道該如何改善,就以計劃失敗收場。

In Taiwan, there are a lot of well-trained Computer Science graduates every year. However, most of them join hardware industry. In 2018, international Tech companies, such as Google, Microsoft, Amazon, IBM, Line, and Oath Yahoo started hire AI engineers. Hopefully, this can bring some changes. In AI field, there are not many people with Software and Data experience. For new hire AI engineers, they might spend time to bump into several problems. Even if for managers with AI consciousness, hiring AI engineers/researchers, when they see slowing progress, bad effect, and bad results. A lot of times, they don’t know how to improve it. Therefore, the project was failed.


Some people might worry about that Taiwan is a small country. There is no data advantages. If you look at the growth speed of data in the world, a lot of data is collected recently. The number grows exponentially. Even if the data is collected before, it’s not clean enough for analysis. Let along saying no enough data.

AI所帶來的服務是能幫企業內部提高價值,如電信業者想透過AI來減少Google Play上電信帳單代收的呆帳。銀行招募AI人才想透過NLP做客戶意見分析,民眾跟銀行的互動已經在網銀、行動銀行App所接觸的時間比去實體分行還多,透過Line與客戶互動,當越來越多的面對客戶管道所蒐集到的資料,會彙整到一個分析平台,來做使用者使用方式分析,來推薦商品或更清楚知道客戶在使用哪些功能,透過ML分析預測使用者的行為模式來帶來更好的服務品質。

What AI can bring in is helping enterprises increase value internally. Take the telecom industry as example, it leverages AI to reduce bad debts made by carrier billing from Google Play. Banks started to hire AI specialists for customer analysis by NLP. People engaging with banks happened more often on Web-bank, Mobile App than in physical banks. There is also interaction channel in Line. When there are multiple engaging channels with customers. The data is collected and merged into one analytics platform. Through customer behavior analysis, banks can recommend products and know more about what functions customers used. Through ML prediction and data analysis, it can bring in better service quality.

傳統的模型或是網路上開源的模型只能把performance帶到一個程度,如需要突破,必須要有ML的思維,自己建模型。AI其實一開始設計出的軟體performance不會是很好,需要不斷的調適,透過feature extraction、model selection、parameter tuning來提高performance。以全球知名的ImageNet比賽,在2015超越人類判讀,從此以後人類再也追不上了。AlphaGo下圍棋來說,一開始也不是比人好,但是經過ML researcher/ML engineer/Data Engineer的合作,終究會有突破的一天。

Traditional statistics model or open-source model can only bring the performance to a certain level. To make breakthrough, it must bring in ML mindset, making its own model. Most of the time, when AI model was developed in the beginning, the performance is not very good. It takes efforts to improve. Through feature extraction, model selection, and parameter tuning to improve performance. Take the global well-known ImageNet Challenge as example, the AI bypass human in terms of classification error in 2015. Since then, human won’t never be able to catch up. Take AlphaGo as an example, in the beginning, it loses to real human. Through the collaboration with ML researcher/ ML engineer/ Data Engineer, it finally beats human beings.

Fig 3. ImageNet Challenge Trend,圖片來源

台灣一些晶片製造商做出sensor,如果有好的系統整合功力,把蒐集資料的data pipeline做好軟硬整合,是很有機會把產品/服務賣到全世界,把資料彙整到data center或是雲平台,然後搭配ML algorithm和data platform,可以做出解決特有的應用場景,相信可以幫助台灣把製造業的強項做到軟硬整合的end-to-end total solution,來解決客戶痛點。比較好的例子像是這兩家新創題目圍繞著監視攝影機保全,Umbo CV (盾心科技)(B2B)是賣給其他企業,Deep Sentinel(B2C)是賣給終端消費者。

In Taiwan, there are some companies making sensor chips. If there is good system integration capabilities, by integrating software and hardware for data collection/ data pipeline, it’s very likely to sell products/services to the world. It can collect and store data in data centers or cloud platform. By leveraging ML algorithm and data platform, it can solve customers’ pain points in certain scenario. By leveraging the existing strength in hardware manufacturing with software/hardware integration, it can provide end-to-end total solution. There are two good examples in video surveillance industry. Umbo CV (B2B) sells product/service to other business. Deep Sentinel (B2C) sells product/service to end customers.


AI+healthcare can alleviate medical doctors’ burden in medical images. Senior medical doctors are usually asked to read the images. If the experience from senior doctors can be learned by AI, the healthcare quality can be increased with reducing man power. There was a report that Stanford researchers had good results in diagnosing Alzheimer’s disease by medical images. It can bring the diagnosis years before hand. In manufacturing factories, some IoT devices can be installed for industry 4.0. It’s possible to reduce man power in defect inspection via AI. It’s possible to reduce the risk of stopping machines for maintenance prediction via AI.

台灣軟體人才很優秀,不然不會吸引國際科技大廠來招員工,如果搭配好的軟硬整合系統架構把AI系統設計出來,其實還是很有機會在這波AI浪潮上,趕上世界的趨勢潮流。一個實際例子Google Play在2018年票選的最受歡迎App就有來自台灣不超過10人的小型開發團隊,台灣開發者軟體實力堅強。(參考報導)

Software engineers in Taiwan are excellent. Otherwise, it wouldn’t attract some international technology companies recruiting employees in Taiwan. If we can integrate hardware and software with good system architecture design, we still can catch up the international AI wave. Take the following as a good example, in 2018 Google Play voting for most popular apps, there are small development teams with less than 10 people got elected. In Taiwan, the software development capability is strong. With it, it depends on whether higher managers can see through how much value software engineers can bring in. Software engineers should be valued.

在機器學習裡有分training phase/ prediction phase,台灣很多做硬體embedded system,以硬體思維會很高興哇可以有edge端運算,這樣也算帶到AI,然而要給預測辨識提高價值的如準確率辨識率,是需要靠大數據平台來做訓練,且需要不斷花時間調試。在資料的學習階段時還是需要軟體大數據,學習好後的模型就可以壓縮佈署到edge端。舉語音辨識相關的應用,可能是智慧音箱或是即時翻譯器,需要蒐集大量的詞彙用語來學習,台灣當地特有的用字遣詞與大陸地方就會不一樣,資料有地區性,隨著時間,新的詞彙可能會生出來,這都需要靠軟體做不斷的學習更新,然後再佈署上edge端。

In machine learning, there are training phase and prediction phase. There are many hardware embedded system companies in Taiwan. They are happy to see the opportunities of bringing AI via edge computing. However, the real value comes from increasing recognition/prediction accuracy. It needs big data platform for training. It takes time to tune parameters. During the learning/training phase, it still require software big data. After learning, it can compress and deploy the model to edges. Take voice recognition as an example, there are related applications, such as smart speaker and  real-time translation. It requires a lot of vocabulary for training. There are vocabulary differences between Taiwan and China. There is locality issue in data. With time flies by, new vocabulary might come out, it requires software continuous learning for update, then deploy to edges.

Fig 4. Training and Prediction system,圖片來源

製造業著重在硬體功能必須兜得起來、每個元件有按照規格做好該做的事情;軟體數據服務則必須 end-to-end 整個系統一起考量,否則很容易會落得 garbage in garbage out 的下場。軟體服務的價值在自己找到痛點並解決。未來會是以軟體提供服務價值為主導的趨勢,Netflix、Spotify、愛奇異、雲端服務也是以每月每月的計費,這就像是我們熟悉的水電費、手機網路費。

Manufacturing industry requires integration for hardware components. Each component follows spec for doing its own thing. Software data service requires consideration for end-to-end whole system. Otherwise, it might get garbage in and garbage out. The value of software service is to find pain points and solve them. The future will be the trend for software service. Netflix, Spotify , iQiyi, and cloud service are charged by monthly usage. It’s very similar to familiar water/electricity bill and mobile phone internet bill.


It’s very likely to burn cash while introducing AI. It’s not easy to recruit AI engineers. It should be said that there are fewer people with AI experience. Companies spend money to send engineers and managers for AI training. After the AI training, it is still possible to get nothing while running projects. It doesn’t come out with what was expected. Many things need to be done correctly for success for AI projects. If enterprises are thinking about introducing AI into the flow, maybe it’s better to find professional AI architects with multiple year experience to help on planning to reduce risk.


Take self-driving car as an example. It’s a very expensive R&D. It needs to collect a lot of video data. Make some prediction for complicated scenes through ML. US technology companies already spent a lot of money, human power, and resources for R&D. Everything is possible. It depends on how much resources that higher-level managers are willing to throw in to solve what kind of problems. How long can the money bring back return. Investors should not just look at the quick money. It’s too short-sighted.



Tips for getting hired as AI research engineer

Tips for getting hired as AI research engineer

◊Join Kaggle competition, and develop machine learning algorithms

◊Join AI hackathon, and keep the finished project result

●用Jupyter Notebook練習資料科學的資料統計及機器學習演算法
◊Use Jupyter Notebook to practice data science with statistics and machine learning algorithms

◊Demo Personal independent AI project, and put the source code on Github.

◊Write down your learning experience for AI/ML on blogger. (Ex.LinkedIn/Medium/Wordpress/Blogger)

◊Earn AI/ML related education degree. It could be Bachelor, Master, or PhD degree.

◊Have some AI/ML research paper published.

◊Set up personal website, or use host your AI portfolio.
[Ex] 可供參考portfolio範例網站
Jason Chuang
Hammad A Usmani