Migration from Physical to Virtual Network Functions: Best Practices and Lessons Learned
Learnings from Case Studies
The adoption of network virtualisation technology in the mobile industry is gathering pace on the path towards the digital transformation of the network. While promising advantages in terms of costs, time to market, reduction of service creation, the migration to virtual network is not free from hurdles.
This article presents a selection of stories from mobile network operators who have virtualised parts of their physical infrastructure with the aim of illustrating both the benefits and the difficulties that had to be overcome some of which seem to be a common thread across all the reported experiences.
Challenges and Risks
- ensuring carrier grade SLA (Service Level Agreement) on IT platform
- Increment of stakeholders and resulting integration of products
- Risk of increase in TCO when VNFs and VI managers are proprietary
- Achieve five 9s availability
- Diversification of services
- Lock in to IT vendors may occur as few vendors can provide telco-grade solutions
- Use of COTs generates cost savings
- No need to over dimension the capacity of the network or plan for major redundancy
- Time to market can be dramatically reduced
- Single vendor approach may have advantages initially as it avoids complicated troubleshooting and cross layer fault detection. Integration savings can also be realised lowering the Total Cost of Ownership
- Industry standards for virtualised components generally below the five 9s expected by mobile sector. An end to end design of the network can help achieving the five 9s target.
- Operation and Maintenance undergoes the biggest transformation when migrating to NFV: management of alarms and incidents, resource management etc… The operator organisation needs to reflect this shift
- Operators may benefit from becoming integrators of the equipment used in their networks and train staff accordingly
- Virtualisation of equipment designed under the assumption that hardware and software are integrated in a physical element creates additional complexity.
According to most industry opinions, 2020 will be a starting point for commercial deployment of 5G network. 5G will not only provide more bandwidth (eMBB, Enhance Mobile Broadband), but also provide a URLLC (Ultra Reliable & Low Latency Communication) and mMTC (Massive Machine Type Communication) to enable the network to have the ability to support connection between IoT (Internet of Things). The 5G network will connect people and dozens of times of population of things which will bring subversive demand to the network. The introduction of virtualization technology is the key to solve the problem. In fact, virtualization technology has been widely used in the IT industry for many years. From now on, the introduction of virtual technology in telecommunications networks will effectively reduce TCO, achieve business innovation and help operators to transform to 5G ready networks. Operators must begin the transition to cloud-based network architectures now to ensure their infrastructure is ready to support new services as they emerge. Operators will begin their 5G journeys from different architectural and technologies as well as varying service capabilities. Many operators are well on their way with network function virtualization (NFV) and programmability (SDN), but others will need to make the transition from a traditional 3G perspective.
The virtualization technology uses a unified hardware pool to enable multiple network elements to share a pool of hardware resources, which will effectively improve the utilization of hardware resources and reduce the cost. NFV network based on cloud technology such as OpenStack, VMWare can realize the integration of IT and CT. Based on large data and AI network automation operation and maintenance, reduce operation and maintenance costs, improve efficiency.
Virtualization of NFV and SDN technology will be the network’s specific resilience, agile to provide customers with the ability to customize on demand, shorten the TTM. The introduction of virtualization technology will also bring huge changes to the organization of operators. This will greatly improve the efficiency of operator management and transform it from the traditional CSP (Communications services providers) to the future DSP (Digital services providers). Meanwhile, the open network capability introduced by virtualization technology will be the combination of OTT and pipeline to achieve business innovation.
The most attraction of 5G network is rich vertical applications, while it is difficult for traditional networks to provide innovative capabilities to those various applications with agility. Scenario-oriented service provision to industrial customers and segmented markets has become the focus of future business development for operators, which has been added in 5G network standards as one of the most strategic requirements. To adapt to the requirements of the industries and segmented markets in a more flexible and faster manner and to accelerate innovations. As confirmed in the Internet industry, virtualization and cloud technology will bring unprecedented service innovation capabilities to the telecom network.
Start Now. The application of virtualization technology in the telecommunications industry has been mature. Mainstream operators have already completed the POC and Field try of virtualization technology and started large-scale commercial deployment, in core network for example vIMS, vEPC, vCS, vSDM and virtualized message system. In fact, virtualization technology has brought practical efficiency gains and benefits to operators.
The 3-tier decoupling. According to ETSI, a reference model was developed for NFV, so that all participants can complete the research and development work according to the common framework. The reference frame is extensible, extending from the beginning of the most basic design and function to the configuration that can accommodate extreme network traffic. The reference architecture includes a hardware layer, resource management as well as the OSS layer and network function layer. In the wave of NFV changes, telecom operators actively explore new technology paths, but also to avoid falling into the plight of vendor lock-in.
5G ready. All investments nowadays, no matter telecom infrastructure or APP(NFVs), should be able to smoothly evolve to 5G networks. This is an effective means of ensuring investment from now. To ensure the competitiveness of operator, this will also help operator rapidly upgrade the network to 5G when needed.
IT & CT convergence is the target. The short-term goal of network migration is 5G, and ICT convergence is a long-term goal. Virtualization technology will eventually lead to the integration of CT and IT, meanwhile OTT and pipelines will also be fully integrated in the future.
These technologies are or will be introduced into the network. They have proven to be successful technologies in the IT industry, but some adaption is needed to accommodate the special needs of the telecommunications industry.
Commercial off-the-shelf (COTS) for Telecom. Different from the traditional IT applications, there are many special applications in the telecommunication industry, such as packet forwarding and voice/video CODEC. In the past, these processes were completed by dedicated hardware, DSP and NP. With the maturity of DPDK, AVX and other technologies, especially the mole law is a great improvement in the ability of the general processor, the telecom applications based on general processors have reached or even exceeded the dedicated processing hardware. This is possible for telecom applications based on a series of standard COTS servers, storage disk arrays and Open Flow capable Ethernet switches. Meanwhile, some vender also provides standard PCI-E interface network interconnect cards (NIC) which contain FPGA (field-programmable grid array) that accelerates forwarding capabilities of COTS infrastructure in line with Telcom requirements.
Telco Cloud. Cloud computing technology has proved to be an effective technology for large-scale application in the IT industry. However, the special needs of the telecom industry for reliability, real-time and large-scale network maintainability need further enhancement of the existing cloud technology. At the present stage, a proprietary NFVI is good for operators to ensure safety and reliability. Of course, in the long run, a private cloud that merges IT and CT services, and even CT services deployed on a shared public cloud, will bring higher efficiency.
NFV (Network Function Virtualization). NFV is the migration of telecom devices from existing dedicated platforms to commercial off-the-shelf (COTS) ×86 servers. In existing telecom networks, all devices are deployed on private platforms. All network elements (NEs) are enclosed boxes, and hardware cannot be shared. Each device requires additional hardware for increased capacity, but this hardware is idle when the system is running below capacity. This is time-consuming, inflexible, and costly. With NFV, however, NEs are independent applications that are flexibly deployed on a unified platform comprising standard servers, storage devices, and switches. In this way, software and hardware are decoupled, and capacity for each application is increased or decreased by adding or reducing virtual resources.
SDN (Software-Defined Networking). SDN separates the control function of a routing device from its forwarding function and changes the way network routing is managed. In this way, network routing maintenance becomes simpler, more flexible, and more dynamic. Furthermore, by opening northbound interfaces, SDN allows third-party applications to control service routing. A central SDN controller can supports both within the DC, and across the WAN, as well as over IP and optical domains.
Network Slicing. The concept of network slicing allows for easy configuration and reuse of network components and functions in each network slice to meet specific application requirements, so many industry people think that the network slicing is the ideal network architecture in the 5G era. Network slicing technology allows operators to cut multiple virtual end-to-end networks in a hardware infrastructure. Each network slice is logically isolated in devices, access networks, transmission networks and core networks, adapting various types of services and meeting the different needs of the users. For each network slice, the exclusive resources such as network bandwidth, quality of service and security can be fully guaranteed. Due to the isolation between slices, a slice error or failure will not affect the communication of other slices. The advantages of network slice technology make it play an important role in 5G network.
Microservice. Microservices are designed upon the concept of high cohesion and low coupling. Microservices communicate with each other through API or a unified message bus. Information about user access and session is all stored on the data sharing layer. Although located at different places, each microservice instance can obtain the latest user status through the data sharing layer. Based on the above design concept, each microservice instance can run, scale up and down, and upgrade separately. The distributed deployment of microservices can also improve application reliability. An NFV which is built at the granularity of a stateless microservice can increase the processing efficiency compared with the traditional 1+1 hot backup mode.
Lightweight Virtualization. Compared with a traditional virtual machine (VM), the container virtualization technology provides high scalability, dense deployment, and high performance. The technology is rapidly developed and widely used in the IT industry. In the design of cloud native application architecture, application components must be deployed upon the container virtualization technology. This can improve resource usage and achieve quick service delivery and agile application maintenance. In an actual deployment, cloud native applications and underlying virtualization technologies are decoupled. They can be deployed in a mixed container/VM environment.
Open API. On a unified cloud platform, both NFVs and OTT applications components run equally. The vertical application components can be integrated together with 4/5G network components into one network slice. In the same slice running on the same platform, vertical application and network components can efficiently communicate and share information through a message bus via an open API. The integration of technology and platform will promote the natural integration of the telecom and IT industries.
In the process of network migration, operators governing their network transformation and implementation of software-centric technologies and architectures. The most important aspects of these requirements are the following:
OPEN Source Software. The telecommunications industry has always been a walled garden. However, with the introduction of cloud computing and other IT technologies, more and more open source technology has been accepted by the telecom industry. First, it is obvious that open source technology can reduce costs. Second, the work of the open source community helps many vendors to work together to rapidly advance the technological evolution of a certain field. Finally, operators themselves are also transforming from simple equipment operation to R&D, so joining and increasing support for open source organizations is the best way.
Carrier DevOps. With the introduction of cloud technology, this will lead to the scale and complexity of the network increasing dramatically, in turn increasing the complexity of network operation and maintenance (O&M). A rapid and effective network automation tool and processes are urgently needed. DevOps already proved its capabilities in the cloud environment. This is an IT methodology and is a collective term for a group of processes, methods and systems. Its advantage is the tight and organic combination of development and operations, enabling on-time delivery of new software functions and services. DevOps is introduced to the telecoms industry to build an agile operation flow and digital service platform, which can reduce network service time-to-market (TTM), TCO and provide carrier-grade service experience. Carrier DevOps introduces the DevOps methodology, which includes automatic tools and testing, and supports on-demand design, continuous integration (CI) and continuous delivery (CD) capabilities to the 5G E2E slice lifecycle management. Compared with traditional 3GPP Release X to X+1, CI/CD works as a loop to realise real-time service on boarding and reduce service TTM. In addition, based on big data analytics, it supports a policy-driven closed work loop control of design, development, verification and testing, fulfilment and assurance.
Automation/Zero Touch. A Telcom Network built by VFNs should be highly automated in multiple phases involving blueprint design, resource scheduling and orchestration, lifecycle management, status monitoring, and control policy update. These phases dovetail with each other through a closed-loop feedback mechanism. This can achieve one-click deployment, full autonomy, and high-effective management. The automated platform enables users to agilely design and quickly deploy NEs and networks for featured services. The Big Data system together with AI/ML system are well suited to telecommunication networks and can monitor the state of the E2E network, thus enabling development of effective network fault processing strategies and business development reviews based on predefined policies. Strategies can be developed for the automatic analysis of massive volumes of historical data on a slice-by-slice basis. When a failure occurs, it can automatically operate on the related network to achieve zero-touch resolution.
Organizational rebuild. The traditional operator’s organizational structure is based on the traditional chimney network structure. When unified hardware resource pool and cloud technology are introduced, IT and network departments will generate many integration, and the scope of CIO and CTO will also need to be merged. As operators more and more emphasis on R &D capabilities with DEVOPS tools, sales, marketing and operation departments must also cooperate more closely under the new workflow, to respond more efficiently and more flexibly to customer demands.
Network virtualization migration involves restructuring infrastructure, service functions, and operation and maintenance (O&M). In actual implementation, the requirements at different stages such as network planning, design, integration, deployment, O&M and optimization should be considered. Due to its large integration workload, long period, and high complexity, the implementation faces many challenges. At the stage of network planning and design, the challenges include planning data centres, selecting an appropriate hierarchical decoupled mode, networking and assuring security. According to the complexity as well as the deployment strategy, the possible choices can be single-vendor mode, hardware independent mode, shared virtual resource pool mode, or fully-decoupled mode. At the integration and deployment stage, the major challenges are how to integrate multiple vendors’ software components and hardware into a stable and efficient system, and how to assign SLA agreements between different components. As to the hardware, blade or rack servers can be a choice. But in addition to unified infrastructure, other factors such as reliability, networking, and installation should also be considered for hardware. The available virtualization software includes VMware and OpenStack. The open-source OpenStack is usually adopted to avoid vendor lock-in, and the native OpenStack API facilitates integration. At the O&M and optimization stage, the challenges are how to achieve an optimal price to performance ratio, and how to quickly locate faults in a multi-component environment. To achieve the best price to performance ratio, component-based performance indicators at various layers such as hardware, virtualization software, service application software, and MANO software should be considered. The component at each layer has its own optimization method. For instance, in a typical virtualization core network project, a full virtual core network with different optimization techniques such as hardware BIOS parameters setting, virtual machine affinity deployment policy, and DPDK forwarding acceleration are used to achieve better performance and lower cost than the legacy system. In addition to the above challenges, operators also face the pressure of rapid upgrade of IT technologies.
Network Infrastructure reconstruction is to build open cloud ICT infrastructures, based on open source software and architectures, enhanced for carrier-grade operation. This infrastructure then forms a unified platform providing infrastructure as some service (IaaS)-like capabilities to communications functions supporting NFV/SDN. There are some key aspects of the Network Infrastructure Reconstruction:
Layered Decoupling. Use common IT HW. 2-layer open structure of SW&HW and resource & application;
Open source platform. Based on open source projects, comply with ETSI NFV standard, open interfaces.
Fully compatible. Compatible with multi-vendor devices and components.
Dynamical Virtual resource management. On-demand allocation, improve resource utilization rate.
Carrier-grade Performance optimization. High-speed network forwarding technology fully optimizes NFVI performance.
Secure & reliable. Carrier-grade enhancement, reliability >99.999%.
Flexible networking. SDN-based centralized control, minute-level quick orchestration. A central SDN controller supports both the single-DC networking mode and multi-DC networking mode, and both the SDN (Software Defined Network) networking mode and non-SDN networking mode.
Intelligent management. Based on Big Data analysis, ML/AI.
Gradual Integration of IT/CT Resource Pool. Air conditioner/power supply sharing; Network sharing: switch, router… Compute/storage resource pool sharing; Unified management, unified resource scheduling;
Instead of monolithic network functions on special hardware, re-written for general compute platforms, composable VNFs consisting of micro-service components, put together under automated management and orchestration. The network functions are supposed to supporting on-demand network slicing, and layered deployment in distributed data centres. Beyond the traditional 3GPP FE virtualization, Open API for service architecture and container technology, are is adopted to build lightweight network components, while retaining openness of network capabilities it provides. Cloud Native is regarded as a collection of Cloud technologies and enterprise management methods which is now wildly accepted by NFVs.
CloudNative includes the following characteristics:
- Micro Service Architecture
- Stateless components
- Separated data layer
- Independent upgrade/scaling
- Light weight container technology
- Automated scaling
- Failure self-heal
- Automated optimization
Typical CloudNative portfolio include:
- Virtual Voice Over LTE (vVolte): VoLTE solution includes virtualized instances of all core network elements (SBC, CSCF, TAS, HSS, MGCF, MRFP, ENUM, EMS, CG, and others). All components have been deployed commercially in operators of Telefonica Group, Telekom Austria Group, China Mobile, and other operators.
- Virtual Evolved Packet Core (vEPC): Using DPDK (data plane development kit), FD.io (fast data input/output) and related NFV acceleration technologies, some vEPC has achieved the performance that allows it to fully replace traditional EPC. The solution allows for control/user plane split, the use of distributed user plane, and provides an affordable way for CSPs to support IoT use cases.
- Virtual message platform: Traditional message services such as VMS, SMS, MMS and RCS have been running on the X86 platform for a long time. Migrate these products to the virtualization platform and introduce MANO to achieve elastic service orchestration.
The transformation of cloud operation needs to be based on the actual situation of the operators. It suggested that the transformation of operation system should be progressively promoted in 3 stages.
The first stage: in the initial stage of transformation, the new network can be managed in a traditional way. Traditional BSS/OSS does not need to pay attention to whether the device is virtualized or not. MANO is responsible for the management of virtualization equipment, independent deployment, and can be quickly launched. The problem with this approach is that the advantages of the new network cannot be brought into full play, and the operation and maintenance mode is the same as the traditional mode. In the first stage, an orchestration system is introduced to realize NFV/SDN management.
The second stage: gradually deepening transformation, the use of new and old network operation parallel. The new self-service portal +Orchestration manages the SDN/NFV network, the traditional network is managed by the present network BSS/OSS system. The present network BSS/OSS needs to support the interface demand of the new management network element, and realizes the real end to end network management.
The third stage: the traditional BSS upgrade to integrated Orchestration and DevOps capability; OSS and Orchestration converge to achieve new and old network integration management. The main goal of the third stage is to support the service of the whole network capability, the network E2E intelligent operation and maintenance, the business SLA automatic guarantee in real time, the user defines the business.
Dump pipe, which is provided by traditional operators, is more and more challenged by OTT. The new business scenarios such as IoT and vertical application provided by 5G network show a good future for operators ,and encourage traditional CSP quickly move to DSP. To achieve maximum benefits from deploying NFV and cloud solutions, operations primarily require support in managing the transition from legacy, appliance-focused environments, to new software-centric NFV systems, capable of supporting both IT and communications functions at maximum efficiency. The legacy and new network elements will be operated in parallel for a long time, and a clear path for migration between the two states is required as well. At the same time, the organizational structure of operators needs to be adjusted in response.
1.11 Lesson Learned
Security and reliability are carrier-grade advantages, these are necessary requirements to achieve carrier-grade service needs in the virtualization architecture. In the new cloud architecture, we have to take full advantages of hot migration, rebirth, snapshot and other technical means provided by the cloud platform; inherit the original hot backup, redundancy, N + K backup ideas; Introduce MANO to achieve end-to-end resource coordination; and set up some smart police to achieve the automatic dynamic elastic closed-loop adjustment of the network. An experienced partner can help operators to meet the carrier-grade requirements, and meanwhile greatly reduce the network TCO, supporting the future evolution of the network.
Viva is the Top 2 operators in Kuwait mobile communications market with market share of 33% (2014). Viva was one of the leading operators to introduce network virtualization to its networks, and it also received “Editor’s Choice Award” at the Network World Middle East Awards with its virtualization projects. In this context, this case study studies the migration experience of Viva Kuwait to highlight the rationale behind Viva Kuwait’s migration, the benefits realized, and lessons learned for other operators to learn.
Network virtualization was a natural step for Viva as it was an integral part of its network transformation roadmap. Virtualized network was necessary for Viva Kuwait to be prepared for new services beyond mobile broadband and ultimately enable digital transformation of the society.
Viva Kuwait also sought network virtualization to resolve complexities in network, high OPEX (Operating Expense) and long time-to-market. With network virtualization, Viva Kuwait can enjoy resource efficiency via centralized deployment, accelerate TTM (time-to-market), simplify maintenance (consequently reduce OPEX), and enable open ecosystem of network.
Viva Kuwait started its migration to virtualized network in 2015. It virtualized STP (Spanning Tree Protocol) and completed the first phase of VAS (Value-added Services) virtualization. In 2016, it further virtualized HSS (Home Subscriber Server), IMS (IP Multimedia Subsystem), EPC (Evolved Packet Core) and the final phase of VAS. Finally, in 2017, it virtualized PCRF (Policy and Charging Rules Function).
The virtualization resulted in two data centre networks based on unified COTS (Commercial Off-The-Shelf) hardware. The two data centres each consists of 3 vDC (virtualized data centres) to achieve best utilization of resources. The three vDCs are one for cloud core, one for cloud edge and one for common management. Cloud core vDC includes vIMS, vHSS, vSTP, vPCRF and vDSP (Demand-Side Platform). Cloud edge include vEPC and vePDG (enhanced Packet Data Gateway). This grouping allowed Viva Kuwait to combine similar functions VM (Virtual machine) in the same HA (High Availability) for highest performance and flexible configuration based on services. Viva Kuwait also deployed similar virtual private clouds within the same vDC, further enhancing resource efficiency, resulting in 85% resources aving utilization for management VMs.
Specifically looking into vIMS/vHSS migration case, Viva Kuwait deployed vNIC (Virtual Network Interface Controller) based on EVS (Elastic Visual Switch) technology to provide high performance in signaling capability and live VM migration without service interruption. It also deployed high redundancy disk storage array based on RAID10.
- We were the leaders to deploy EVS CODEC in Kuwait market for better customer experience and be ready for the voice over 5G which requires such CODECs. which required more than 8 months testing with different device vendors “Apple, SAMSUNG and HUAWEI” , customizations in the vIMS network, and customization in the eNodeB as well.
- Next phase is to agree with devices vendors to support SWB as a default CODECs during negotiation with network to provide more voice equality.
- As being the leader in deploying VoLTE Interconnection, we agreed with local operators to migrate all the interconnection CS traffic to be on VoLTE Interconnection routes which will leverage the road to all IP interconnection, stop investments in TDM links and higher call qualities. The main problems faced were the interconnection CDRs which requires the IoI information and the CODECs during negotiations.
In virtualizing EPC, Viva Kuwait deployed VNF (Virtual Network Function) based on stateless design. vEPC’s, vNIC was based on SR-IOV (Single Root Migration from physical to virtual network functions – best practices and lessons learned Virtualisation) technology which provides high performance in throughput and latency compared to EVS and standard Open Virtual Switch. Viva Kuwait also employed distributed load balancing and distributed session DBs (Databases). The vEPC also adopted high redundancy disk storage array based on RAID 10 and designed the network based on ACT/ACT N-Way redundancy and fault tolerance. Finally, vEPC and EPC were deployed in a hybrid pool to enable smooth service migration and to maximize resource usage.
- We were the leaders in Kuwait to deploy a real and live demo of 5G services which introduced high data rates to customers and new 5G services like eMBB, 4K/8K, and VR.
- The deployed network is based on NSA option 3X where the NR is the “Traffic Steering Point”
- Choosing NSA for NR fast deployment by utilizing continuous LTE Coverage
- We used upgraded versions of vEPC nodes (vMME+, vGW-C/U, and vHSS+) to implement CUPS during our 5G live demo, where the control plan is served by the eNodB , while the user plan is served by the NR.
- The target was to achieve fast deployment duration, and integrated 5G NR sites with independent TAC from the existing 4G network. With no effect on existing live 4G Subs.
In the initial phase of network virtualization, Viva Kuwait used a single vendor approach to avoid complicated troubleshooting and cross layer fault detection. This also enabled lower TCO (Total Cost of Ownership) as there was no need for customization for integration of different vendors. Other benefits include fast TTM and simple O&M (Operations and Maintenance).
After being experienced in the NFV (Network Function Virtualization), Viva Kuwait started to introduce different vendors in low traffic services. For example, VAS nodes such as SMSC and USSDGW (Unstructured Supplementary Services Data Gateway). Viva Kuwait expects that the change to multi-vendor approach will allow selection of the best vendor for each layer and exploit the potential of cloud technology.
In the future, Viva Kuwait will implement PaaS (Platform-as-a-Service) abilities such as container and micro services. This will first start with static MEC (Multi-Access Edge Computing) and network slicing services, and then to on-demand network slicing.
Viva Kuwait will also introduce a common management platform for unified management in addition to fault detection and auto healing. This will enable agile and intelligent operation on O&M for new generation 5G cloud. This will also include multi-tier DC management and light deployment to enable MEC.
In addition, vEPC will also be upgraded to support NSA (Non-Standalone) 5G deployment. vEPC will also introduce DECOR (Dedicated Core Network) to enable provisioning of differentiated service over EPC.
Finally, vIMS will be modernized to facilitate migration to 5GC (5G Core). For example, IP-SMS-GW (IP Short Messaging Service Gateway) will be modernized to support the SGd interface for SMS to be supported in 5GC, which will save investment in deploying new SMSC (Short Message Service Centre) for 5GC. Also, as introduction of new interfaces and protocols such as HTTP will increase length of messages and require more links than conventional diameter protocol, existing DRA (Diameter Routing Agent) will be migrated to the cloud. DRA will also be modernized to HRA (Hierarchical Routing Architecture) to support huge signaling requirements. SEPP (Security Edge Protection Proxy Function) will also be introduced for security and topology hiding when supporting roaming among 5GCs.
Viva Kuwait believes that the main challenge in virtualization is ensuring carrier grade SLA (Service Level Agreement) on IT platform. While IT system’s availability is 99.9%, that of carrier grade cloud requirement is 99.999%. Furthermore, detection of failed compute node needs to happen within a second for carrier grade, but it can take more than one minutes for IT based platform
To achieve carrier grade SLA, Viva Kuwait is implementing enhancements to VNF layer and COTS layer. For VNF layer, Viva Kuwait is deploying VNFs based on stateless nature VMs. This implies that the session info is not stored in the VM itself but in a Shared Database that stores the state (Session Info). If one VM is lost, the subscribers sessions are not lost so another processing VM can interrogate the Shared Database layer to obtain the session info and resume the session for the users without being disconnected and without the need to have any redundancy technique on the processing VM level. This is the recommended VNF Architecture that is commonly seen as a step towards a cloud native. Viva Kuwait is also deploying the traffic capacity and throughput VMs based on N- Way redundancy principle instead of ACT/STB principle. This enables high reliability and multi fault tolerance. Furthermore, deployment of distributed load balancing and distributed session DB also enables resource efficiency and faster scale in/out without services interruption. Finally, Viva Kuwait is also deploying a new Cloud OS release to support VM with redundant and fast recreation capability.
For COTS layer, Viva Kuwait is building COTS based on structure redundancy design with lower failure rates by deploying storage HD (Hard Disk) arrays based on RAID10 principle. In addition, Viva Kuwait is deploying virtual Networking NIC cards based on SR-IOV. This enhanced packet forwarding capability and provided higher throughput rates 40Gbps/core and lowest single trip latency up to 25us. SR-IOV also enables VMs to directly connect to physical NICs to obtain the I/O performance and low latency equivalent to those provided by physical NICs, and multiple VMs can share physical NICs.
Viva Kuwait also believes that the following would be necessary to be considered by any operator that migrates from physical to fully virtualized network:
- The operator should decide the roadmap to approach VNF transformation, whether using (Platform-driven NFV, or Application-driven NFV)
- In multi-layered architecture, it is difficult to demarcate fault (i.e. layered network and decoupled architecture will double the difficulty of problem delimitation)
- It is challenging for non-carrier grade hardware to run HA service.
- Integration on multivendor environment is very complex (i.e. different interfaces combinations increase the multiple complexity of compatibility)
- Difficult to harmonise with unclear legacy network evolution roadmap (for example, how to ensure massive legacy assets, subscribers and services smoothly migrate from PNF to VNF?)
Before starting the migration journey to virtualized network, Viva Kuwait expected benefits in TCO and time-to-market when virtualization was complete. These expectations are evaluated in the following subsections based on the 3 years commercialization experience of Viva Kuwait.
Viva Kuwait was able to achieve cost reduction by simplifying over 20 existing hardware variants to just having one COTS hardware. In addition, it can now target peek value for capacity planning rather than planning 30% over peek value. Finally, redundancy rate was enhanced to 20% with N-way redundancy from 50% in the past.
Time-to-market was also realized. With virtualization, it was possible to change network functions immediately with software upgrade. In the past, 3~5 months were required to change network function as processes involved hardware delivery, software upgrade and deployment. Viva Kuwait enjoyed less than 2 months TTM to get new services online.
From its migration experience Viva Kuwait learnt the following lessons:
- It is crucial to learn end-to-end Design for Five-9s (99.999%) Services availability.
- Operators need to follow a proactive approach to handle incidents before impacting user. For example, auto healing, multi-layer fault detection and auto resources management.
- It is important to fine tune for best resources management in different DCs
- Operators must change the organization structure to have a common department to handle the NFV resources for the Telco cloud and IT cloud
SKT is the market leader in South Korean mobile communications market with market share of 48.2% (2017). Whilst SKT has been leading in various areas of telecommunications, it was one of the first to debut PoCs (Proof of Concepts) of virtualized EPC (Evolved Packet Core) and virtualized IMS (IP Multimedia Subsystem) in 2013. Since it started migration to virtualized network in 2015, it now operates up to hundreds of commercial VNF (Virtual Network Functions) in its data centres. In this context, this case study studies the migration experience of SKT to highlight the rationale behind SKT’s migration, the benefits realized, and lessons learned for other operators to learn.
SKT introduced virtualized networks in the expectation to reduce the cost and transform its cost structure in slim and lean fashion like the ones of IT (Information Technology) data centres. Virtualization was essential to make network infrastructure streamlined from traditional telco network infrastructure.
SKT also expected to reduce time-to-market of new services on the network with virtualization. As virtualization separates the hardware from the software, introducing new features would require significantly shorter time to deploy than in the case of legacy dedicated network equipment in theory.
Finally, SKT aimed to foster service innovation with its new virtualized networks. Having a fully virtualized network would open up opportunities to develop and implement new services on network.
The virtualization is expected to transform telecom network infrastructure into lean and slim telco data centres. These telco data centres, however, need to guarantee telco-grade performance and required significant consideration in development. SKT consulted different IT vendors and traditional CT equipment vendors in preparation.
In 2013, two years before the start of the migration, SKT conducted numbers of feasibility test & PoCs with vendors spanning from traditional network equipment vendors (e.g., Nokia and Samsung) to IT vendors (e.g., HP Enterprise and VMWare). SKT’s endeavour came into fruition with world’s first PoC for vEPC (virtualized EPC) and vIMS (virtualized IMS) in the same year. In 2014, SKT developed a roadmap for its virtualization strategy and prepared for virtualizing the commercial networks.
In 2015, SKT started its migration by commercializing vEPC and vIMS. The NFV (Network Function Virtualization) orchestrator that orchestrates the network overall was also implemented by SKT. However, VNF (Virtual Network Function) managers and VI (Virtual infrastructure) managers were provided by vendors, as they tend to come in as a single package along with EMS (Element Management System). All of its main central offices were now equipped with vEPC and vIMS.
Two years later in 2017, SKT started replacing the vendor proprietary VNF managers and VI managers with a generic VNF and VI managers of its own. It is still on-going as of Jun 2018 in all its main central offices. Furthermore, SKT developed end-to-end network orchestrator which is being deployed in its data centres. The orchestrator was kept separate from the OSS (Operating System Support) to keep operations of physical network and the operations of virtualized network separate.
To achieve the vision of fully virtualized network, SKT has been implementing only virtualized versions of network functions, where many of its legacy infrastructure have been virtualized as of Jun 2018. In 5G, SKT expects to start with fully virtualized network to leverage power of virtualization from the start.
Furthermore, the physical networks and virtual networks will be harmonized with the SKT’s next generation OSS solution called TANGO. It will be evolved to provide single view monitoring and correlation analysis of end-to-end network including physical and virtual networks, toward fast-reactive and proactive operation, zero-touch optimization, and sophisticated customer experience management.
Finally, virtualization will be coupled with next generation technologies such as network slicing and MEC (Multi-access Edge Computing) to bring service innovation. SKT expects that virtualized network will prepare SKT to exploit full potentials of network slicing and MEC
During the migration, SKT faced some challenges especially in operation. Firstly, there were more stakeholders in the scene where there were hardware vendors, software vendors and management solutions vendors. It was difficult to coordinate and integrate the number of stakeholders. SKT resolved this by organizing an expert group under the operations department where the group was educated on basics of virtualization, details of respective components and cooperating across different departments.
Secondly, being the integrator of the network software, hardware and management meant that SKT had to have the knowledge of the codes and the implementation of different components. SKT’s research and development team participated in and analysed the open source codes to gain deeper understanding of the virtualization. The deeper understanding allowed SKT to effectively integrate and manage the virtualized network.
In addition, SKT launched the virtualized systems (EPC and IMS) in isolation, where only network functions were virtualized, but hardware and operations were isolated. This meant that the experiences were not shared, and the infrastructure was not shared, limiting the benefits of virtualization. To resolve this, SKT is preparing 5G to start with integrated virtualized infrastructure and operations before developing network functions.
Lastly, there was operational complexity and increase in TCO (Total Cost of Ownership) as VNF managers and VI managers were vendor proprietary. When a new VNF is introduced, new VNF managers and new VI managers were required which increased complexity. This was because there was fragmentation in management, but also because different versions of OpenStack were used. To resolve this, SKT developed generic managers and orchestrator to ensure interoperability in management of network functions and the overall network.
Before starting the migration journey to virtualized network, SKT expected benefits in TCO, time-to-market and service innovation when virtualization was complete. These expectations are evaluated in the following subsections based on the 3 years commercialization experience of SKT.
SKT found that TCO reduction was possible by separating hardware from the software. However, it should be noted that the virtualization was done in silo systems and having many small-scale virtualizations limited to possibilities of cost reduction. To maximize the TCO reduction, common network infrastructure is required across the whole network just like the IT data centres.
In addition, there is a caveat to the cost reduction, where SKT experienced lock-in to new IT vendors instead of traditional CT equipment vendors. Whereas SKT expected many innovative small vendors to join the ecosystem of vendors and have a reasonable mix of many vendors, the reality was the opposite. As economies of scale can only be realized by large vendors, only a limited number of IT vendors could supply hardware with desired price and hence SKT experienced another lock-in to these IT vendors.
Time-to-market was the benefit that exceeded the expectation of SKT. With virtualization, it was possible to change network functions immediately with software upgrade. In the past, 3~4 months were required to change network function as processes involved hardware delivery, software upgrade and deployment. Given that there is sufficient hardware infrastructure, change of software and scale can be done very quickly.
Service innovation did not meet the expectation as there was hardly any change in services. In the virtualized network of SKT, the network functions are mostly limited to QoE (Quality of Experience) management and there is very little diversity of services offered. This is because there are only few vendors that can understand telco ecosystem and requirements.
From the case of SKT, operators can learn the following valuable lessons for their future virtualization strategy and implementation.
First, the migration should start with network infrastructure and operations. If the network infrastructure and operations are not integrated, resources and operational experience of the systems will not be shared across different network functions. It means that operations have to start over every time a network function is introduced. Therefore, in 5G, SKT will start from virtualization of network infrastructure rather than virtualizing network functions.
The migration should also consider the effect of economies of scale and that vendor diversification may be challenging to achieve. As in the case of SKT, only few vendors were able to supply hardware in competitive prices and another few vendors had the knowledge to understand telco requirements. Although small vendors may provide innovative solutions, the telco-grade data centre would still create dependency on few vendors and operators need to be prepared for the dependency.
It also turned out that virtualization of 4G networks were very challenging. Although it is true that virtualizing network functions first made operations complex, 4G networks are designed to suit the legacy CT equipment scheme where hardware and software are integrated in one physical box. Virtualization in 5G networks would be much easier as 5G networks are designed to be cloud native, minimizing the gap between virtualization and mobile networks.
Lastly, virtualization itself is not the end but only means to create more agile and flexible network. As virtualization prepares operators for network slicing and MEC, operators must look beyond simply virtualizing its networks but also on enabling new services with network slicing and MEC. Operators are recommended to consider innovation possibilities in services seriously when virtualization networks.