Responsibilities:
1. Design, deploy and maintain multiple data centers across geographies containing hundreds of servers and hardware devices. Plan network topology architecture and high-performance storage architecture across geographies
2. Design deployment solutions for cloud computing systems with large loads and high concurrency and deploy them in real-world environments. Effectively support load balancing system and DNS system across multiple geographies
3. Manage, operate and maintain cloud computing systems, monitor host and service status, and quickly solve problems or find alternatives when problems are found. Analyze system performance bottlenecks and provide solutions.
4. Work closely with the development team to upgrade the deployed cloud computing system in a timely manner, apply patches as needed to ensure the security and stability of the system
5. Lead the O&M team, train O&M engineers, develop O&M specifications, and form a well-functioning O&M team
6. Work closely with other business departments and cooperative units to ensure the normal operation of the system
Requirements:
1. Bachelor's degree or above in Computer Science or related majors
2. 3+ years of experience in O&M, managed hundreds of servers, and have a deep understanding of O&M work. Strong analytical skills and ability to locate problems
3. Strong operational experience with hardware environments such as servers, network devices, storage devices, etc.
4. Familiarity with virtualization technologies, especially KVM and XEN
5. Familiarity with cloud computing IaaS systems, including Amazon, CloudStack, OpenStack. Eucalyptus, Opennebula, Rackspace, etc.
6. familiarity with cloud storage systems, including Amazon EBS/S3, Swift, Walrus, etc.
7. familiarity with remote monitoring technologies, such as SNMP, Hyperic HQ, etc.
8. familiarity with centralized configuration management Good knowledge of centralized configuration management technologies, such as Puppet, Chef, etc.
9. Good scripting skills, such as Shell, Python, Perl, etc.
10. Good communication skills in English, good English document writing skills
11. Strong communication skills, conscientious and responsible, self-driven, and able to deal with emergencies.
Additional Requirements (Optional):
1. Practical Java/Ruby/Python development experience
2. Familiarity with cloud PaaS systems (e.g. Google App engine, CloudFoundry, etc.)
3. Familiarity with cloud management software (e.g. Rightscale, Scalr, Enstratus, etc.)
4. Familiarity with data-based cloud computing (e.g. Mapreduce/Hadoop, big data processing and analytics, etc.)