Aller au contenu principal

Security in ordinary OS, UNIX

· 7 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

UNIX System

Unix was developed by Bell Labs(Ken Thompson, Dennis Ritchie, and Douglas Mcllroy) in the late 1960s and early 1970s. It is a multi-user operating system that provides protection from other users and protection for system services from users. Unix was designed as a simpler and faster alternative to Multics.

UNIX Security

  • A running Unix system consists of the operating system kernel and many processes, each running a program. The protection ring isolates the Unix kernel from the processes, and each process has its own address space.
  • Unix uses the concept of files for all persistent system objects, such as secondary storage, I/O devices, network, and inter-process communication.
  • Unix security aims to protect users from each other and to protect the Trusted Computing Base (TCB) from all users.
  • The UNIX TCB (Trusted Computing Base) consists of the kernel and several processes that run with the identity of the privileged user, root, or superuser.
  • These root processes provide various services, including system boot, user authentication, administration, and network services.
  • A Unix process is associated with an identity based on the user associated with the process. Access to files is limited by the process's identity.
    • Each user owns a set of files - providing a simple way to express who else can access them. All user processes run as that user.
    • The system owns a set of files - The root user is defined as the system principal and can access anything.
    • Users can invoke system services but need to switch to the root user (setuid) to do so.
  • Both the kernel and root processes have full system access.
  • The UNIX DAC security model cannot express security requirements, as many rights are accessible by default, and means for limiting rights are impractical.
  • The use of UNIX mechanisms has evolved over time, resulting in vulnerabilities.

Protection System

  • Implements a classical protection system, not a secure protection system.
  • The Unix protection system consists of a protection state and a set of operations that enable processes to modify that state. Thus, UNIX is a discretionary access control (DAC) system.
  • Aspects of a Secure Protection System:
    • The Unix protection system defines a transition state that describes how processes change between protection domains.
    • The labeling state is largely ad hoc. Trusted services associate processes with user identities, but users can control the assignment of permissions to system resources.
    • In the final analysis, these mechanisms and the DAC are insufficient to build a system that satisfies the secure OS requirements.

Protection status:

  • Subjects
    • Users
    • Groups
    • The process accesses objects on behalf of the corresponding user.
  • Objects
    • Files
    • Directories
  • Operations
    • Read
    • Write
    • Execute

Subjects:

  • Users:
    • username
    • uid
    • groups
    • Special user - root
    • Nobody - special user with no ownership and belonging to no groups
  • Process:
    • uid, gid - real user id, effective user id, and file system user id
    • Users run processes.
  • Groups:
    • Users belong to one or more groups.
    • The primary group is defined in /etc/passwd.
    • All other groups are defined in /etc/group.
    • Commands to change group membership (newgrp).
    • Group membership grants additional permissions beyond uid.

Challenges:

  • Unix is more focused on protection than security. It assumes a non-malicious user and a trusted system by default.
  • One challenge in Unix is the Discretionary Access Control (DAC) system, which allows a user or their processes to update permission assignments. This means that each program has all the user's rights, and the user must trust that their processes are not malicious.
  • Another challenge is assigning file permissions based on what is necessary for things to work. Unfortunately, this means that all user processes are granted full access, and services have unrestricted access. Furthermore, users can invoke setuid(root) processes with all rights, which means they must trust the system processes.

Authentication

  • Login Process:
    • Starts at boot time and runs as root.
    • Takes a username and password.
    • Applies crypt() to the password with stored salt.
    • Compares the resulting password to the value in /etc/shadow for that user.
  • Start a process for a user:
    • Execute the file specified as login in /etc/passwd.
    • The identification (uid, gid, groups) is set by the login.

Authorization

The file's owner UID must be equal to the process's effective UID, and the file's group GID must be a member of the process's active group.

UID Transition

  • During the login process, the user ID is root
  • After authentication, the user ID for the shell becomes "paolo".
  • Setuid enables a user to escalate privileges and define the execution environment.
  • Services must protect themselves, otherwise, a user might gain root access.

UNIX Object

  • Unix objects are represented as files, which can be categorized into:
    • Regular files
    • Device files
    • Socket files
    • FIFO files
    • Link files
  • Directories are a hierarchical organization of files. A file's path is a sequence of directories followed by the file name.
  • Beyond socket files, there is no network access control.

Permissions

File permissions are divided into three categories: Owner, Group, and Others.

The three types of permissions are Read, Write, and Execute, represented by rwx

ex) chmod 644 file - owner can read/write, group, others can read only.

  • chmod is a command used to change the permissions of a file.
  • chown is a command used to change the owner of a file.
  • chgrp is a command used to change the group of a file.

Chroot

Chroot is a way to create a domain in which a process is confined. The process can only read and write within a filesystem subtree, which applies to all descendant processes. You can also carry file descriptors in a 'chroot jail'. Setting up requires great care because, unfortunately, chroot can trick its own system.

UNIX vulnerabilities

  • Some UNIX functions have security problems, including:
    • The ability to mount a CD-ROM
    • The ability to mount a filesystem with privileged functions
  • These vulnerabilities can either be system-wide issues or require careful programming to prevent buffer overflows.
  • Mount vulnerabilities occur when multiple file systems on different physical devices are mounted under the same directory (/), and when a file system is mounted with a setuid program.
  • Link vulnerabilities include adding a new path to an inode, assigning multiple names to a single node, and requiring programs to know which files they are using.
  • Device file vulnerabilities can bypass access control by accessing memory or user inputs.
  • The /tmp vulnerability involves creating a file in the shared space (/tmp), giving it a filename used by a higher authority service, and ensuring that the service can access the file.

Linux Vulnerabilities

  • Buffer overflows
  • Race conditions
  • Abuse of programs running with "setuid root"
  • Denial of service attacks(DOS)
  • Web application vulnerabilities
  • Rootkit attacks

System Security Tools

  • Bastille: A comprehensive system-hardening utility that educates as it secures.
  • Tripwire: A utility that maintains a database of crucial system file characteristics and reports all changes made to them.
  • Snort: A powerful, free intrusion detection system (IDS) that detects common network-based attacks.
  • Nessus: A modular security scanner that probes for common system and application vulnerabilities.

Cryptography

· 7 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

History of Cryptography

Cryptography is a method of keeping information secret that has been around for more than 2500 years. It has always been a competition between those who create codes and those who break them. In the past, people used things like paper and ink, cryptographic engines, telegrams, and radio to do cryptography. Today, computers and digital communication are used for modern cryptography.

Terminology

  • Plaintext: the original message
  • Ciphertext: the transformed message
  • Key: the secret used in the transformation
  • Encryption: the process of transforming plaintext into ciphertext
  • Decryption: the process of transforming ciphertext back into plaintext
  • Cipher: the algorithm used for encryption/decryption

Security Goals (CIA)

  • Confidentiality: Only authorized individuals may access the information.
  • Integrity: Only authorized parties can change the information using authorized methods.
  • Availability: Authorized individuals can access the information.

Security Principles

  • Hiding information is not a reliable way to keep it secure.
  • Assume the person trying to break in knows how your security system works, except for your password or secret key.

Secure Communication

Secure communication is a way of protecting data from unauthorized access. It involves encryption and other measures to ensure data is securely transmitted between two or more parties. Encryption is scrambling data so that it can only be read by the intended recipient.

Approaches to Secure Communication

  • Steganography: covered writing that hides the existence of a message. It depends on the secrecy of the method used.
  • Cryptography: hidden writing that hides the meaning of a message. It depends on the secrecy of a short key, not the method used.

Cryptography, Cryptanalysis, Cryptology

  • Cryptography is the process of creating algorithms and protocols that are used for security purposes. It is often used interchangeably with cryptology.
  • Cryptanalysis is the process of breaking cryptography.
  • Cryptology refers to both cryptography and cryptanalysis, but this term is becoming less common.

Shift Cipher(Caesar’s cipher)

  • Each letter in the plaintext is replaced by a letter K positions ahead of it in the alphabet.
  • An attacker can find K by trying every possible key since there are only 25 possible keys or fewer(key 0 return plaintext).
  • Once K is found, encryption is very easy.

shift-cipher.png

Substitution Cipher

  • Each letter X in the plaintext P is replaced with Y.
  • They dominated the art of secret writing throughout the first millennium A.D.
  • They were thought to be unbreakable by many back then.
  • Substitution ciphers are vulnerable to frequency analysis attacks. Each language has certain features such as the frequency of letters or groups of two or more letters.

substitution-cipher.png

One-Time Pad (OTP)

  • OTP improves substitution cipher by using different keys to encrypt letters in different positions.
  • The key should be a random string that is at least as long as the plaintext, according to Shannon's theorem.
  • Shannon's theory states that the ciphertext should not reveal any information about the plaintext.
  • Encryption is similar to the shift cipher.
  • OTP was invented by Vernam in the 1920s.
  • Never use the same key in OTP more than once.

Stream Cipher

  • The One-Time Pad encryption method uses a random key that is at least as long as the message being encrypted.
  • Stream ciphers use a Pseudo Random Number Generator (PRNG) to create a sequence of numbers that appear random.
  • Stream ciphers are usually fast.
  • RC4 is a proprietary stream cipher owned by RSA, which became public in 1994.
  • SSL/TLS uses RC4. SSLv3 does not have any major known vulnerabilities.
  • If the same stream is used more than once, it becomes easy to break.
  • These vulnerabilities are basic and still exist regardless of the strength of the PRNG.

Pseudo Random Number Generator (PRNG)

  • A PRNG is a tool used for cryptography and simulation.
  • When using the same seed, a PRNG will always give the same output stream.
  • For a PRNG to be cryptographically secure, it needs to have unpredictable sequences.
  • PRNGs are also useful for generating temporary keys and other similar tasks.

Adversarial Models for Ciphers

  • The attacker is supposed to know the language of the original message and the type of encryption used.
  • Ciphertext-only attack: The attacker can only see some encrypted messages.
  • Known-plaintext attack: The attacker has seen some pairs of encrypted and decrypted messages.
  • Chosen-plaintext attack: The attacker can choose some messages and see the corresponding encrypted messages.
  • Chosen-ciphertext attack: The attacker can choose some encrypted messages and see the corresponding decrypted messages.

Data Encryption Standard (DES)

  • DES was designed by IBM with modifications proposed by the National Security Agency.
  • From 1977 to 2001, it was a US national standard.
  • It uses a block size of 64 bits and a key size of 56 bits.
  • It has 16 rounds and was mostly designed for hardware implementations.
  • However, it is now considered insecure and vulnerable to brute-force attacks.

Data Integrity and Source Authentication

Encryption does not fully protect data from being changed by someone else. To make sure the data gets to its destination without being changed and to make sure it comes from a known source, we need something else.

Cryptographic Hash Function

A hash function maps a message of arbitrary length to an m-bit output.

Since a hash function is a many-to-one function, collisions can occur.

Hash functions are used for the following examples:

  • Software integrity
  • Timestamping
  • Message authentication
  • One-time passwords
  • Digital signatures

Well-known hash functions

  • MD5 - produces a 128-bit output. It has been completely broken by researchers in terms of collision resistance.
  • SHA1 - produces a 160-bit output. No collisions have been found yet, but there is a method to find collisions in less than 2^80 attempts. It is considered insecure for collision resistance.
  • SHA2 (SHA-224, SHA-256, SHA-384, SHA-512) - produces 224, 256, 384, and 512 bits output, respectively.
  • Hash outputs should generally be twice the key length of block ciphers due to the birthday attack.

Limitations of Using Hash Functions for Authentication:

  • Anyone can calculate the hash of a message, since the hash function is public.
  • It is not always possible to have a secure channel.

To fix these issues, you can:

  • Use multiple hash functions.
  • Select a hash function with a key.

Encryption and Authentication (3 Ways)

  • Authenticate-then-encrypt (AtE): This method is used in SSL. However, AtE may not always be secure because the first step is decryption, which can reveal whether the decryption was successful or not.
  • Encrypt-then-authenticate (EtA): This method is used in IPSec. Encryption alone may not be enough to ensure privacy. EtA is a secure option.
  • Encrypt-and-authenticate (E&A): This method is used in SSH. However, the MAC method does not guarantee confidentiality.

RSA Algorithm

The most common public-key algorithm is called the RSA method, named after its inventors (Rivest, Shamir, Adleman). In this algorithm, the private key is a pair of numbers (N, d), and the public key is a pair of numbers (N, e). It should be noted that N is common to both the private and public keys.

Shortest Path

· 3 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

Given a weighted graph and two vertices u and v, find a path of minimum total weight between u and v

applications:

  • internet packet routing
  • flight reservations
  • driving directions

single source on unweighted graph ⇒ just use BFS

Dijkstra algorithm

This algorithm constructs the shortest path tree from a source node. It does not allow negative edge weights and is similar to Prim's algorithm. The algorithm employs a greedy strategy and has a complexity of O(|E| + |V|log|V|). It is widely used in various applications, such as artificial satellite GPS software. Since negative edges do not usually exist in the real world, Dijkstra's algorithm is one of the most suitable algorithms for practical use.

  1. Set the starting node.
  2. Save the minimum cost of each node based on the starting node.
  3. Select the node with the lowest cost among the nodes that have not been visited.
  4. Consider the case of going to a specific node through the selected node and update the minimum cost.
  5. Repeat steps 3-4 until all nodes have been visited.

Bellman-ford algorithm

When all edge costs are positive, you can use Dijkstra's algorithm. However, if there are negative edges and negative cycles, a node with negative infinity occurs. In this case, you should use the Bellman-Ford algorithm. The Bellman-Ford algorithm can be used in situations involving negative edges and can detect negative cycles. The basic time complexity of the Bellman-Ford algorithm is O(VE), slower than Dijkstra's algorithm.

  1. Choose a starting node.
  2. Initialize the shortest distance table as infinite.
  3. Repeat the following process n-1 times: Check each of the total e edges one by one. Calculate the cost of going to other nodes through each edge and update the shortest distance table.
  4. If you want to check for the existence of a negative cycle, repeat step 3. once more. If the shortest distance table is updated this time, a negative cycle exists.

dijkstra_bellman_ford

Floyd-Warshall

Dijkstra's and Bellman-Ford algorithms are algorithms that find the shortest path from one vertex to another when starting from a single vertex. However, if you want to find the shortest paths between all pairs of vertices, you need to use the Floyd-Warshall algorithm. While Dijkstra's algorithm chooses the least cost one by one, the Floyd-Warshall algorithm performs the algorithm based on the intermediate vertex.

dijkstra_bellman_ford

The green zone is a changeable area. Nodes with self-loops and containing intermediate vertices are not able to be changed.

The key idea of the Floyd-Warshall algorithm is to find the shortest distance based on the intermediate vertex. Calculate the number of cases where all n nodes are passed through.

The time complexity is O(V^3).

Minimum Spanning Tree (MST)

· 2 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

A spanning tree is a tree that connects all the vertices of a graph through some edges. If a graph has N vertices, then its spanning tree will have N-1 edges. A minimum spanning tree is a spanning tree of a weighted graph with the minimum total edge weight. the lightest edge plays a crucial role in finding the MST. There are two algorithms for finding minimum spanning trees: Kruskal's algorithm and Prim's algorithm.

Applications:

  • Communication networks
  • Transportation networks

Prim's Algorithm

  • Prim's algorithm uses a greedy strategy.
  • It starts with an arbitrary vertex v and chooses the minimum edge that connects v to a new vertex w.
  • This process is repeated until the minimum spanning tree (MST) is found.
  • Utilizes a binary heap to store the edges outside clusters.
  • The running time is O(E log V) since the graph is connected.
  • The space complexity is either O(E + V) using a binary heap or O(V) using a Fibonacci heap.

Kruskal's Algorithm

  • Scans the edges only once.
  • Works with disconnected graphs
  • Ensures that the result is a tree with no cycles.
  • Utilizes a priority queue to store the edges outside clusters.
  • Maintains a forest of trees.
  • The running time is O(E log E) or O(E log V), whichever is smaller.
  • The space complexity for storing the graph and the disjoint-set data structure is O(V + E).

minimun-spanning-tree

Buffer Overflow

· 3 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

Overview

Buffers are memory storage regions that temporarily hold data while it is being transferred from one location to another.

Buffer overflow happens when data quantity exceeds memory buffer storage capacity. This can occur due to a programming mistake when a process tries to store data outside the limits of a fixed-sized buffer. Consequently, the program that attempts to write data to the buffer overwrites adjacent memory locations.

The consequence of a buffer overflow is the corruption of data, unexpected transfer of control, and possible memory access violations.

Exploiting

buffer_overflow

To exploit a buffer overflow, the attacker needs to identify a buffer overflow vulnerability and understand how that buffer will be stored in the process memory.

For example, a buffer for login credentials may be designed to expect 8-byte inputs for both the username and password. If a transaction involves an input of 10 bytes (2 bytes more than expected), the program may write the excess data past the buffer boundary.

if attackers know the memory layout of a program, they can intentionally feed input that the buffer cannot store, and overwrite areas that hold executable code, replacing it with their own code

Types of buffer overflow attacks

Stack-based attacks:

This type of attack is more common and leverages stack memory that only exists during the execution time of a function.

Heap-based attacks:

This type of attack can be difficult to carry out and may involve flooding the memory space allocated for a program beyond the memory used for current runtime operations.

Program Memory layout

buffer_overflow

When a program runs, it needs memory space to store data. For a C program, its memory is divided into five segments, each with its own purpose:

Text Segment:

Stores the executable code. Usually read-only.

Data Segment:

Stores static/global variables.

BSS Segment:

Stores uninitialized static/global variables. Filled with zeros by the OS.

Heap:

Used to provide space for dynamic memory allocation.

Stack:

Used for storing local variables defined inside functions.

Prevent Strategy

Address Space Randomization (ASLR):

ASLR randomly moves the address space location of data regions. Typically, buffer overflow attacks need to know the locality of executable code. Randomizing address spaces makes this virtually impossible.

Data Execution Prevention:

Data Execution Prevention is a security feature that flags certain areas of memory as non-executable or executable. This prevents an attacker from running code in a non-executable region.

Structured Exception Handler Overwrite Protection (SEHOP):

SEHOP is a built-in system that helps prevent malicious code from attacking structured exception handling. It manages hardware and software exceptions, thus preventing an attacker from using a stack-based buffer overflow to overwrite an exception registration record stored on a thread's stack.

Cases

  • Morris worm in 1988
  • Code Red worm in 2001
  • SQL Slammer in 2003
  • Stagefright attack against Android phones in 2015
  • CVE-2021-21017 (impacts Adobe Acrobat Reader)

Virtualization

· 6 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

Virtualization is the process of creating a simulated computing environment or a computer-generated computer. It provides an abstraction layer between computing, storage, and networking hardware and applications. Virtualization makes hardware independent of operating systems (OS) and applications and can be provisioned to any system. The OS can be encapsulated into one virtual system. Virtualization allows us to create multiple computing instances from a single machine. These instances could be computers in the traditional sense, or storage repositories, servers, or networking configurations.

The software that enables virtualization is called a hypervisor. the hypervisor is a lightweight software layer that is installed on top of hardware to create virtualized environments., allowing multiple operating systems to run on the same hardware. The hypervisor acts as a middleman, pulling resources from the raw materials of the infrastructure and directing them to the various computing instances.

Types of hypervisor

Type 1 - Bare Metal Hypervisor

A Type 1 Hypervisor, also known as a Bare Metal Hypervisor, runs directly on computer hardware instead of the operating system. Therefore, Type 1 Hypervisors offer better performance and are commonly used in enterprise applications.

Type 2 - Hosted Hypervisor

A Type 2 hypervisor runs as an application on an operating system. It is used when running various operating systems on a single machine.

Types of Virtualization

OS-level virtualization.

Operating systems allow for the running of multiple secure virtual servers, making subsystems think they are running on their own operating system. Guest OSes are the same as the host OS but appear isolated. Examples include Docker, Linux Vserver, Solaris Containers, and FreeBSD Jails.

Application-level virtualization

The application behaves at runtime in a similar way to when it directly interfaces with the original operating system. The application provides its own copy of components that are not shared. The application virtualization layer replaces part of the runtime environment normally provided by the OS. Examples include the Java Virtual Machine (JVM)

Full/Native Virtualization

Virtual machines simulate enough hardware to allow an unmodified guest to run in isolation. Any software capable of executing on the hardware can be run in the virtual machine. Every operation performed within a given virtual machine must be kept within that virtual machine. Intercepting and simulating privileged operations is challenging. Examples include VirtualBox, Parallels, and Mac on Linux.

Paravirtualization

In paravirtualization, the virtual machine (VM) does not simulate hardware. Instead, it presents a software interface to the VM that is similar to, but not identical to, the underlying hardware. A special API (para-API) is used to achieve this, which the modified guest OS must also use. Hypervisor then traps hypercalls and services them. Example include Xen

Emulation

A virtual machine (VM) emulates complete hardware and software, while an emulator is a hardware or software tool that enables a system (host) to behave like another system (guest). Emulators are useful for reverse engineering, malware analysis, and forensics. Examples include VirtualPC for Mac and the Android emulator.


Virtual Machine

In the early days, software was written for specific hardware. Eventually, instruction sets became more standardized, but the software still requires a certain instruction set architecture and operating system that meets strict standards.

A virtual machine can be categorized as a “system virtual machine” or a “process virtual machine”. The former provides a full execution environment that can support multiple processes and I/O devices, as well as a GUI. The latter is instantiated for a single program and terminates when that process terminates.

Popular VM providers include VirtualBox, Xen Project, Microsoft Hyper-V, and VMware Workstation Player.

Java Virtual Machine

The Java Virtual Machine (JVM) interprets Java bytecode, allowing the same bytecode to run on all architectures. The JVM can also run multiple programs simultaneously.

Linux Containers(LXC)

LXC is an OS-level virtualization method for running multiple isolated Linux systems on a single control host. It does not provide a virtual machine but rather a virtual environment with its own CPU, memory, block I/O, network, etc. This is achieved through namespace and cgroups features in the Linux kernel on the LXC host.

Benefits

This technology allows the execution of multiple operating systems on a single physical machine. This results in cost savings when compared to using separate physical machines. Furthermore, the technology has well-established functionality, robust management tools, and well-known security tools and controls.

Vulnerabilities

Virtual Machines (VMs) are susceptible to various types of attacks. These include:

  • Hardware-oriented attacks
  • Management interface exploits
  • Break out of jail attacks
  • Almost-undetectable virtual-machine based rootkits (Blue Pill)
  • Application privilege escalation
  • Just-in-time spraying
  • Untrusted native code execution

VM vs. Containers

Virtual MachineContainer
Provides complete isolationProvides application-level abstraction
Requires large storage spaceLightweight
Has high overheadWorks well with Linux, but has limited support for Windows
Supports multiple OSWeak security
Provides greater stability for hypervisors and VMSignificant management overhead
Provides better securityNot well-suited for large applications
Important for microservices design

Namespaces

Namespaces restrict what a container can see and provide process-level isolation of global resources. Processes have the illusion that they are the only process in the system. There are six namespaces:

  • mnt (mount points)
  • pid (processes)
  • net (network stack)
  • ipc (System V IPC)
  • uts (hostname)
  • user (UIDs)

Docker

Docker is an open-source project that automates the deployment of applications inside software containers. It provides an additional layer of abstraction and automates OS-level virtualization on Linux. Docker uses resource isolation features of the Linux kernel, such as cgroups and kernel namespaces, to allow independent containers to run within a single Linux instance, avoiding the overhead of starting and maintaining virtual machines.

Docker includes the libcontainer library as a way to directly use virtualization facilities provided by the Linux kernel, in addition to using abstracted virtualization interfaces via libvirt, LXC, and systemd-nspawn. Docker implements a high-level API to provide lightweight containers that run processes in isolation. Unlike a traditional virtual machine, a Docker container does not require or include a separate operating system. Instead, it relies on the kernel's functionality and user's resource isolation, as well as separate namespaces to isolate the application view of the operating system.


References

Dart functional programming

· 3 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

Functional programming creates new forms of values using type conversion and built-in functions.

It is possible to chain the results and ultimately make the code more concise.

Usage

type casting

// List <-> Map <-> Set

// Cast in List
List<String> blackPink = ['a', 'b', 'c', 'd',];
final Map blackPinkMap = blackPink.asMap(); // {0: a, 1: b, 2: c, 3: d}
final Set blackPinkSet = blackPink.toSet(); // {a, b, c, d}

// Case in Map
final Map blackPinkAsMap = blackPink.asMap();
List<dynamic> keys = blackPinkAsMap.keys.toList(); // [0, 1, 2, 3]
List<dynamic> values = blackPinkAsMap.values.toList(); // [a, b, c, d]

// Cast in Set
Set blackPinkFronSet = Set.from(blackPink); // {a, b, c, d}
List balckSet = blackPinkFronSet.toList(); // [a, b, c, d]

map()

  • Methods of List
  • Returns a new list after changing the format of all member variables.
  • The list created by map() has a different address value even if the member variables are the same

iterable:

A collection that can be iterated over (list, array)

Use the .toList() method to convert to a list

List<String> blackPink = ['a', 'b', 'c', 'd',]

blackPink.map((item){
return 'this it $item'
});

// with arrow function
final blackPink2 = blackPink.map((item) => 'black pink $item')

mapping Map

Map<String, String> harryPotter = {
'harry potter': '해리포터',
'ron': '론',
'hermione': '헤르미온느'
};

// When mapping a map, you receive both key and value.
final result = harryPotter.map(
(key, value) => MapEntry('harry potter character $key', '해리포터 캐릭터 $value'));

void main() {
// key
print(harryPotter.keys.map((item) => 'key is $item').toList());
// value
print(harryPotter.values.map((item) => 'value is $item').toList());
}

mapping Set

Set blackPinkSet = {"rose",'jisu','jenny','lisa'};

final newSet = blackPinkSet.map((item) => 'this is $item').toSet();

where()

Purpose of filtering

List<Map<String, String>> people = [
{
'name': '제니',
'group': '블랙핑크',
},
{
'name': '로제',
'group': '블랙핑크',
},
{
'name': 'RM',
'group': 'BTS',
}
];

final result = people.where((item) => item['group'] == 'BTS').toList();

reduce()

  • Only the first values of the previous and next are entered in the first time.
  • After that, the return value of the callback function becomes the previous value of the next iteration.
  • Reduce can only return results of the same type as instance variables.
  • This problem can be solved with fold()
List<int> numbers = [1, 2, 3, 4, 5];

int resultInt = numbers.reduce((prev, next) {
return prev + next;
});

// arrow function
numbers.reduce((prev, next) => prev + next);

fold()

  • Like reduce(), it allows you to set the return data type while setting an initial value
  • The first parameter goes into prev, and the first member variable goes into next
List<int> numbers = [1, 2, 3, 4, 5];

// Specify the return type as Generic
numbers.fold<int>(0, (prev, next) => prev + next);

List<String> words = ['Hi', 'Hello'];

words.fold<String>('', (prev, next) => prev + next); // HiHello
words.fold<int>(0, (prev, next) => prev + next.length); // 7

…cascade operator

  • When combining lists
  • The member variables are unpacked and added to the new list
  • A new list is created
List<int> even = [2, 4, 6];
List<int> odd = [1, 3, 5];
List newArr = [...even, ...odd];

// `[...even]` and `even` are different.

Introduction to K-way Merge

· 4 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

About the Pattern

The K-way merge pattern is a fundamental algorithmic strategy used to merge K sorted data structures—such as arrays and linked lists—into a single sorted structure. It extends the traditional merge sort algorithm, which combines two sorted structures into one.

Note: For simplicity, the term lists will refer to both arrays and linked lists in this discussion.

The core idea behind the K-way merge algorithm is simple: repeatedly select the smallest (or largest, for descending order) element from the first elements of the K input lists and append it to a new output list. This process continues until all elements are merged into the output list, maintaining the sorted order.

How the Algorithm Works

The K-way merge algorithm has two main approaches for achieving its goal, each leading to the same result. Let's explore these approaches in detail.


Approach 1: Using a Min Heap

  1. Initialize the Heap
    Insert the first element of each list into a min heap. This heap helps efficiently track the smallest current element among the lists.

  2. Build the Output List
    Remove the smallest element from the heap (always at the top) and add it to the output list.

  3. Track Element Origins
    Keep track of which list each element in the heap originated from to determine the next element to add.

  4. Replace Elements in the Heap
    After removing the smallest element, replace it with the next element from the same list.

  5. Repeat Steps
    Repeat steps 2–4 until all elements are merged into the output list.


Approach 2: Grouping and Merging Pairs of Lists

  1. Divide into Pairs
    Divide the K sorted lists into pairs, grouping them for two-way merges.

  2. Perform Two-Way Merges
    For each pair, perform a standard two-way merge to create a single sorted list for that pair.

  3. Handle Odd Lists
    If there is an odd number of lists, leave one list unmerged for the current round.

  4. Repeat Pairing and Merging
    Continue pairing and merging the resulting lists until only one sorted list remains.


Examples of Problems Solved with K-way Merge

  1. Median of K Sorted Arrays
    Find the median of K sorted arrays.

  2. K-th Smallest Element Across Arrays
    Find the K-th smallest element among multiple sorted arrays.


Identifying Problems That Match This Pattern

A problem likely fits this pattern if:

  • Merging Sorted Structures: It involves merging multiple sorted arrays or matrices.
  • Finding K-th Smallest/Largest: It requires identifying the K-th smallest or largest element across sorted collections.

Real-World Applications

  1. Patient Records Aggregation
    Combine sorted streams of patient data (lab results, notes, reports) into a unified, chronologically ordered record for comprehensive healthcare management.

  2. Merging Financial Transaction Streams
    Merge sorted transactions (trades, payments, transfers) into a single stream for real-time analysis and fraud detection.

  3. Log File Analysis
    Merge server logs into a single, time-ordered stream for performance monitoring, user behavior analysis, or error diagnosis.


Merge Sorted Array solution:

The following example demonstrates merging two sorted arrays in place:

  1. Initialize two pointers: p1 (last element of nums1) and p2 (last element of nums2).
  2. Initialize a pointer p pointing to the last element of nums1.
  3. Compare elements:
    • If nums1[p1] > nums2[p2], set nums1[p] = nums1[p1] and decrement p1 and p.
    • Otherwise, set nums1[p] = nums2[p2] and decrement p2 and p.
  4. Repeat until all elements of nums2 are merged into nums1.

The algorithm processes the arrays in reverse order, starting from the largest elements and filling nums1 from the end. This approach prevents overwriting existing data in nums1 as it is merged in place. The merge only considers the first m elements in nums1 and the first n elements in nums2. Any extra space in nums1 beyond m + n is irrelevant for the merging process.

Introduction to Two Heaps

· 4 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

About the Pattern

The two heaps pattern is a versatile and efficient approach for solving problems involving dynamic data processing, optimization, and real-time analysis. As the name suggests, this pattern uses two heaps, which could be:

  • Two min heaps
  • Two max heaps
  • A combination of one min heap and one max heap

By exploiting the heap property, this pattern is often used to implement computationally efficient solutions.

Key Properties of Heaps

  • Insertion/Removal Complexity: O(log n), where n is the number of elements in the heap.
    • When adding a new element to the heap:
      • Insert it at the last position of the heap.
      • Restore the heap property by moving upward, comparing the element with its parent node (Heapify-Up).
  • Access Root Element: O(1).
    • Min Heap: Root stores the smallest element.
    • Max Heap: Root stores the largest element.

Common Use Case

Given a dataset, the two heaps pattern can be used to divide it into two parts and efficiently find:

  • The smallest value from one part (using a min heap).
  • The largest value from the other part (using a max heap).

Other scenarios include finding:

  • The two largest numbers from two datasets (using two max heaps).
  • The two smallest numbers from two datasets (using two min heaps).

These examples highlight the pattern's flexibility for solving various problems by enabling quick access to minima and maxima as needed.


Examples

1. Sliding Window Median

Given an array of integers and a window size k, find the median of each sliding window of size k as it moves from left to right through the array.

2. Find Median of a Number Stream

Given a continuous stream of numbers, efficiently find the median of the numbers seen so far after any insertion.

  1. Insert the new data into the Max-Heap first.
  2. If the root value of the Max-Heap is greater than the root value of the Min-Heap, move the largest value from the Max-Heap to the Min-Heap to maintain the correct order.
  3. Adjust the sizes of the two heaps to maintain balance:
  4. The size of the Max-Heap should be at most 1 greater than the size of the Min-Heap. Median Calculation: If the sizes of the two heaps are equal, return (Max-Heap root+Min-Heap root)/2 If the Max-Heap is larger, return the root of the Max-Heap.

Does Your Problem Match This Pattern?

Applicable Scenarios

The two heaps pattern applies if:

  1. Static or Streaming Data:
    • Linear Data: Input data is linear but not sorted. (If sorted, this pattern isn’t required.)
    • Stream of Data: Input is a continuous data stream.
  2. Maxima and Minima Calculation:
    • The input can be divided into two parts, requiring repeated calculations for:
      • Two maxima.
      • Two minima.
      • One maximum and one minimum.

Real-World Problems

1. Video Platforms

For demographic studies, calculate the median age of viewers efficiently as new users sign up for video streaming.

2. Gaming Matchmaking

Match players of similar skill levels for a balanced gaming experience. Use:

  • One heap for minimum skill levels.
  • One heap for maximum skill levels.

3. IPO Solution

Efficiently select projects to maximize profits with the following steps:

  1. Initialize a Min-Heap: Store the project capital requirements.
  2. Identify Feasible Projects: Find projects investable within the current capital range.
  3. Select the Most Profitable Project: Choose the project yielding the highest profit.
  4. Update Capital: Add the earned profit to the current capital.
  5. Repeat Until Target Achieved: Continue until k projects are selected.

Introduction to In-Place Manipulation of a Linked List

· 3 minutes de lecture
Mr. Frugal
I'm afraid of paying full price.

In this guide, we'll explore the In-Place Manipulation of a Linked List pattern, its real-world applications, and the problems it helps solve.

About the Pattern

The in-place manipulation of a linked list allows us to modify a linked list without using additional memory. "In-place" refers to an algorithm that processes or modifies a data structure using only existing memory space, without requiring extra memory proportional to the input size.

This pattern is especially useful for problems that involve modifying the structure of a linked list, such as changing the order of node connections. For instance, certain problems require reversing a set of nodes or even the entire linked list. Instead of creating a new linked list with reversed links, we can achieve this in place without extra memory usage.

A naive approach to reversing a linked list involves traversing it and creating a new linked list with reversed links. This method has a time complexity of O(n), but it consumes O(n) extra space.

How can we implement in-place reversal efficiently?

We iterate over the linked list while keeping track of three nodes:

  • The current node
  • The next node
  • The previous node

Tracking these three nodes allows us to efficiently reverse the links between each pair of nodes. This in-place reversal operates in O(n) time and consumes only O(1) space.

Examples

Below are some problems that can be solved using this approach:

  1. Reverse the second half of a linked list

    • Given a singly linked list, reverse only the second half.
  2. Rotate a linked list clockwise k times

    • Given a singly linked list and an integer k, rotate the linked list clockwise k times.

Does Your Problem Match This Pattern?

Yes, if the following conditions are met:

  1. Linked list restructuring

    • The input is a linked list, and the task requires modifying its structure rather than the data in individual nodes.
  2. In-place modification

    • The changes must be made in place, meaning we cannot use more than O(1) additional space.

Real-World Applications

Many real-world problems utilize the in-place manipulation of a linked list pattern. Here are some examples:

1. File System Management

  • File systems often rely on linked lists to organize directories and files.
  • Operations like rearranging files within a directory can be efficiently handled by modifying the linked list in place.

2. Memory Management

  • In low-level programming and embedded systems, dynamic memory allocation and deallocation often involve manipulating linked lists of free memory blocks.
  • Tasks such as merging adjacent free blocks or splitting large blocks can be optimized using in-place operations.

Reverse Linked List Solution

Follow these steps to reverse a linked list in place:

  1. Initialize

    • Set prev and next pointers to NULL.
    • Set the curr (current pointer) to the head node.
  2. Traverse the linked list

    • Continue until the curr pointer reaches the end.
  3. Reverse pointers

    • Set the next pointer to the next node.
    • Reverse the current node’s pointer to point to the previous node.
  4. Update pointers

    • Move prev and curr forward.
  5. Update head

    • After traversal, prev points to the last node of the original list.
    • Set the head pointer to prev.

This efficient approach allows us to reverse a linked list in O(n) time using only O(1) space.