• Home
  • |
  • blog

  • |
  • How Amazon provides an analysi...

How Amazon provides an analysis environment to data scientists and analysts using Amazon AppStream 2.0

Written By notebooktabletphone

Our challenge

On February 28, 2020, Amazon announced that it had taken measures to protect the health of employees and community.This includes large -scale events, online meetings with stakeholders, and suspension of tours of the Full Filing Center.At the time of posting this article, Amazon has invested more than $ 8 billion in safety measures against COVID-19.

In order to complement these safety measures, Amazon needed to predict the spread and risk of COVID-19 at each base.This prediction required an interactive report and a machine learning model.Therefore, Amazon has established a secure data lake for storing highly confidential data and a global -scale, impaired analysis environment.The task of building such data lakes is a contradictory requirement.On the other hand, the data must be safely provided and isolated, and on the other hand, it must be disclosed to consumers who intend to data.

The architecture of this solution needed to meet the following security requirements:

To provide access to such an environment, Amazon adopted VDI (Virtual Desktop Infrastructure) as a solution.In VDI, only the screen description is streamed by the user, and the data itself is not stored on the user's device.Separating the work environment of Amazon's data scientists and data analysts can enhance security.In addition, you can improve your performance by placing the tool near the data.

Amazon builds a solution on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon Appstream 2.Consider whether to use AWS managed services such as 0 and Amazon Workspaces, finally Appstream 2.0 was adopted.

Amazon Appstream 2.What is 0?

Amazon Appstream 2.0 is a managed, non -permanent application and desktop streaming service.Appstream 2.Desktop applications can be centrally managed on the desktop application and can be safely distributed to various devices.Introducing, provisioning, or operations of hardware and infrastructure, you can distribute applications to users around the world with unlimited users.Appstream 2.0 is built on the AWS cloud and can benefit from data centers and network architectures designed for security -sensitive organizations.

Appstream 2.0は非永続的なsolutionです。アプリケーションのアップデートやオペレーティングシステムのパッチを管理するためのイメージベースのアプローチを持つAppstream 2.0 was most suitable for Amazon's needs.Administrators can prepare the environment for data scientists and data analysts, and do not have many man -hours for updates.

さらに、Appstream 2.0 has the function of using the IAM roll through an instance profile, so it was able to provide access to various AWS services without the need for access keys and secret keys.This allows one of the important security requirements.In addition, the native audit function has easily satisfied Amazon's security audit requirements.

AmazonがAmazon Appstream 2.0を用いてデータサイエンティストとアナリストに分析環境を提供した方法

Finally, using Auto Scaling automatically reduces the environment when not in use, so Amazon was able to build a highly -effective solution.

Appstream 2.After deciding to use 0, Amazon was able to complete the concept to verification in one week.If you needed to build a VDI (virtual desktop infrastructure) environment using EC2, you needed a lot of time.I think we had to build a server and streaming gateway, examine the advantages and drawbacks of third -party VDI solutions, and manage it ourselves.

solution

AmazonがCOVID-19プロジェクトにAppstream 2.0を採用したことで、データサイエンティストやデータアナリストは、隔離されたデータに安全な方法でアクセスできるようになりました。ユーザーは当社の企業ネットワークに接続されている場合にのみデータにアクセスすることができます。 また、Appstream 2.0 provides the administrator a function to disable file transfer, printing, and copy and paste, preventing users from bringing out highly confidential data to their devices.

The following figure is an abbreviation of this solution.

このsolutionを実装するために、プロジェクトチームはAppstream 2.The existing SAML authentication used in Amazon was used as an entry point for 0.This not only requires multi -factor authentication before permission to access users, but also prevents users from accessing the environment from outside the corporate network.

Appstream 2.Windows can be used on 0, so you can use familiar applications for data analysts.This has reduced re -work and re -education.For example, data analysts can execute Amazon Redshift queries via an ODBC driver and analyze data for reports required by management.

Using Amazon Sagemaker allows data scientists to use Jupyter Notebooks in this secure environment, but cannot copy notebooks or cells.With this feature, data scientists can build machine learning models using sensitive data sources related to COVID-19, while also realizing security requirements that data cannot be downloaded to local devices.bottom.

また、Appstream 2.The 0 API is linked with Lambda to check if the user's session identifier is disguised.This feature creates another layer to verify user and prevent data leakage.This guarantees that the user who has been accessed to data is the person.

Environment audit

前の章で述べたように、要件の1つは環境の包括的な監査の実現です。Amazonでは、誰がこの環境に入ったのか、誰がどのデータに触れたのかを知る必要がありました。Appstream 2.0環境ではセキュリティイベントが発生した場合、当社のセキュリティチームはすべての活動を完全に追跡することができます。これを実現するため、AWS CloudTrailのログとAppstream 2.The usage report of 0 is incorporated into the centralized log storage location.This allows our security analysts and incident -less -action teams to accurately grasp who logged on our environment and all the actions performed during the session.

The examples of the logs to be collected are shown below.

まずCloudTrailからLogonの報を収集します。これにより、ログオンしたユーザーのユーザー IDを得ることができます。次にCloudTrailからAmazon S3 putを収集し、Appstream 2.0インスタンスのIPアドレスを取得します。最後に、Appstream 2.0の使用レポートを収集し、Appstream 2.0 Get the instance IP address and user ID.This allows you to link the user ID that executed the activity at the point of Amazon S3.

Conclusion

Amazon has built a secure data lake that incorporates, curates, and analyzes highly confidential data related to COVID-19 to take safety measures for COVID-19.Amazon had to build a new VDI environment so that data scientists and data analysts can access the data.

AmazonはVDIsolutionとして、EC2を使用して独自に構築するのではなく、Appstream 2.0を選択することでインフラの構築と管理に時間を費やす必要がなくなり、迅速な対応が可能になりました。また、Appstream 2.0を使用することで、ユーザーが社内環境から接続したときにのみデータにアクセスできるようにし、すべてのアクティビティが監査され、追跡可能であることを保証することができました。さらに、Appstream 2.By using 0, we were able to provide better performance and consistent user experience compared to the case of direct access to data scientists and data analysts directly from business PCs.

AmazonがCOVID-19への安全対策を強化する中で、Appstream 2.0 continues to play an indispensable role in providing safe access to confidential data.

Please refer to this blog post (English) for specific introduction methods.