Preliminary understanding of STL and string

What is STL?

STL (Standard Template Library): It is an important part of the C++ standard library. It is not only a reusable component library, but also a software framework that encompasses data structures and algorithms.

STL version updates

The original version, completed by Alexander Stepanov and Meng Lee at HP Labs, adhered to the open-source spirit, allowing anyone to freely use, copy, modify, distribute, and commercially use the code without payment.

The only condition was that it also needed to be used as open-source, just like the original version.

The HP version is the ancestor of all STL implementations.
The PJ version, developed by PJPlauger and inherited from the HP version, was adopted by Windows Visual C++.

It cannot be publicly released or modified, and its drawbacks include relatively low readability and unusual symbol naming.

The RW version, developed by Rouge Wage and inherited from the HP version, was adopted by C++Builder. It cannot be publicly released or modified, and its readability is average.

The SGI version, developed by Silicon Graphics Computer Systems, Inc. and inherited from the HP version, was adopted by GCC (Linux).

It has good portability, can be publicly released, modified, and even sold, and its naming and programming style makes it highly readable.

string class

Strings in C. In C, a string is a collection of characters ending with a string. For ease of manipulation, the C standard library provides a series of library functions.

However, these library functions are separate from string manipulation, which doesn’t quite align with OOP principles.

Furthermore, the underlying memory needs to be managed manually, and carelessness can lead to out-of-bounds access.

void test_string()
{
	string s1;//Default Construction
	string s2("hello world");//Constructing with const char*
	string s3(s2);//copy construction 

	string s4(s2, 6, 5);//Copy 5 characters backwards from the 6th bit of s2
	//If the copied length exceeds the actual length of the string, it will be copied until the end of the string
	//If the third parameter is omitted, it will be copied to the end of the string by default


	string s5("hello world", 5);
	//Initialize by taking the first five characters of the string

	string s6(10, 'x');
	//Initialize by taking n consecutive characters x

	s2[0] = 'x';//The value of s2[0] is a reference return, so it can be used to modify the content

	int x = s2.size();//Return string length
}

Above, we briefly looked at some related usages with specific code examples. Combining this with what we’ve written before, we can see that the underlying logic is roughly as follows:

class String
{
private:
	char* _str;
	size_t _size;
	size_t _capacity;
};

We provide the following three methods throughout the process of traversing the container:

Subscript + []
Iterator
scope

The first traversal method can be compared to arrays; the same way you access elements in an array, you can access them here. The third method is suitable for traversing containers or arrays and is simple to use.

Here, we’ll focus on the most important one—iterators. Iterators provide a general way to access containers; you can access all containers through them. They are similar to pointers but not necessarily pointers.

Forward iterators:

string::iterator it = s.begin();

Here we defined a strict iterator itthat returns an iterator to the starting position. It can also be written as auto it = s2.begin(); It’s equivalent to a pointer and can change the elements of the container.

However, it’s important to note that when we use it it returns the position after the last position.
Reverse iterator:

string::reverse_iterator rit = s.rbegin();

This returns an iterator to the last element; conversely, it returns the position before the first element. When using a reverse iterator, the iterator moves forward, meaning it moves backward.
When we encounter a string definition like this:

const string s("hello world");

At this point, when the string is modified, we cannot use a regular iterator; we need to use an iterator. Compared to a regular iterator, we only need to add `<iterator>` before it.

string::const_iterator cit = s.begin();

Unlike other modified member languages, this iterator is read-only and cannot be written.
Similarly, like regular iterators, iterators can be forward or reversed.

string::const_reverse_iterator dit = s.rbegin();

In summary, there are four types of iterators: iterator, reverse_iterator, const_iterator, and const_reverse_iterator. Here we only provide a brief introduction to iterators; we will gain a deeper understanding of their capabilities later.

void test_string()
{
	string s("hello world");

	//traverse containers
	//1.Index+[]
	for (size_t i = 0; i < s.size(); i++)
		cout << s[i] << " ";
		
	//2. Iterator
	//Forward iterators
	string::iterator it = s.begin();
	while (it != s.end())
	{
		cout << *it << " ";
		it++;
	}
	//Reverse Iterator
	string::reverse_iterator rit = s.rbegin();
	while (rit != s.rend())
	{
		cout << *rit << " ";
		rit++;
	}

	//Const Iterator
	const string s1("hello world");
	string::const_iterator cit = s1.begin();
	while (cit != s1.end())
	{
		cout << *cit << " ";
		cit++;
	}

	string::const_reverse_iterator dit = s1.rbegin();
	while (dit != s1.rend())
	{
		cout << *dit << " ";
		dit++;
	}

	//3.Scope for
	//Scope for traversal of containers or arrays
	for (auto ch : s)
	{
		cout << ch << " ";
	}
}

Additional information: Keywords

Here are two additional C++11 syntax tips.

In earlier C/C++ versions, `auto` meant that variables modified with it were local variables with automatic memory; this became less important later. In C++11, `auto` has a completely new meaning: it’s no longer a storage type indicator, but a new type indicator that instructs the compiler to deduce the type of the declared variable at compile time.

When declaring pointer types, `auto` and `auto` are used * interchangeably, but when declaring reference types, `auto` is required. When declaring multiple variables on the same line, these variables must be of the same type; otherwise, the compiler will report an error because it only deduces the type of the first variable and then uses the deduced type to define the others.

Note: `auto` cannot be used as a function parameter; it can be used as a return value, but caution is advised. It cannot be used directly to declare arrays.

void test()
{
	int a = 1;
	auto b = a;
	auto c = 'd';
	auto d = 3.12;
	//Cannot be written as auto e; The compiler is unable to deduce the type and allocate space

	//Typeid can view the type of variable
	cout << typeid(b).name() << endl;
	cout << typeid(c).name() << endl;
	cout << typeid(d).name() << endl;

	//Auto cannot define arrays
	//void func(auto a) is incorrect, cannot be used as a parameter, but can be used as a return value
	//auto func() is correct and can be used as a return value, but it is recommended to use it with caution
}

Below, we’ll use code examples to give you a general understanding of how to use it:

void test_string()
{
	string s("hello world");
	//The difference between length and size:
	//Length is not universal, only the string class has a length method, while size is universal, with almost all containers having a size method.
	//The functions of length and size are the same, both returning the length of a string. But it's better to use size.
	cout << s.length() << endl;
	cout << s.size() << endl;
	
	//capacity：returns the string capacity
	cout << s.capacity() << endl;
	
	
	//reserve：Open space in advance to avoid expansion and improve efficiency. When the space is less than the current string length, the capacity will not be reduced, but when the current space is greater than the string and the reduced space is greater than the string length, the capacity will be reduced. It won't affect the string!
	//Each compiler is different, depending on the specific situation
	string s;
	s.reserve(100);//Open 100 spaces in advance and do not expand when pushed (not full)

	//resize：Change the size of the string, it can be larger or smaller. When enlarging, '\ 0' is used as the default padding, or padding characters can be specified
	s.resize(10);//Convert to 10 characters, default filled with '\ 0'


	//clear：Clear data, but generally do not clear capacity s.clear();
	s.clear();


	//empty：Determine if it is empty
	if (s.empty());
}

Regarding: Minimize the use of INSERT statements, as excessive use can lead to efficiency issues.

void test_string()
{
	string s = "1234567";

	s.push_back('8');//Insert a character at the end of the string
	
	s.append("000");//Insert a string at the end of a character
	s += "000";
	//The function of applied is the same as 's+=', it is recommended to use the '+=' operator for a more concise approach


	s.insert(3, "xxx");//Insert the string 'xxx' before the position of index 3
	s.insert(0, "hello");//Head insertion can be achieved in this way, but the efficiency is relatively low

}

Regarding the use of erase: Caution should also be exercised when using it, as it may also cause efficiency issues.

void test_string()
{
	string s = "1234567";

	s.erase(6, 1);//Starting from the sixth position, delete an element

	s.erase(0, 1);
	s.erase(s.begin());//These two operations can be performed using header deletion

	s.erase(--s.end());
	s.erase(s.size() - 1, 1);//These two operations can achieve tail deletion

	s.erase(3);//Delete all elements after subscript 3 by default

}

Regarding replacing content in a string: This can also easily lead to efficiency issues.

void test_string7()
{
	string s = "hello world hello china";
	s.replace(5, 1, "%%");//Replace the subscript from 5 to the next character with %%

	string::npos;//Represents the maximum index value of a string, which is actually the maximum value of an unsigned integer

	size_t pos = s.find("hello");//Search for the first occurrence of the string 'hello', default is to search from scratch


	//Replace spaces with %%
	//Method 1：
	size_t ss = s.find(' ');
	while (ss != string::npos)
	{
		s.replace(ss, 1, "%%");
		ss = s.find(' ', ss + 2);///Search from the next two positions of the current location
	}
	//Method 2：
	string tmp; tmp.reserve(s.size());//Open space in advance to avoid expansion
	for (auto ch : s)
	{
		if (ch == ' ')
			tmp += "%%";
		else
			tmp += ch;
	}
	swap(tmp, s);



	size_t pos1 = s.find("he", 5);//Search for hello starting from index 5
	size_t pos2 = s.rfind("he");//Search from back to front for the location where he first appeared

	string sf = s.substr(6, 5);//Starting from index 6, take 5 characters back and return a new string. If there is no second parameter or the second parameter is out of range, take it to the end of the string


	//find_first_of：Find the position of the first specified character that appears in a string
	string str("Please, replace the vowels in this sentence by asterisks.");
        size_t found = str.find_first_of("abcd"); //Find the first occurrence of a, b, c, and d in the string, and the first occurrence can be any of abcd
	while (found != string::npos)
	{
		str[found] = '*';
		found = str.find_first_of("abcd", found + 1);//Find any one of the elements backwards
	}
	//find_1ast_of: Similar to find_first_of, but looking from back to front
}

The underlying simulation implementation:

Here we briefly mention the simulation implementation that can be performed after encapsulation. Through simulation, we can gain a clearer understanding of the operating rinciple.

Regarding encapsulation: the design of iterators is also a manifestation of encapsulation, shielding the underlying implementation details and providing a unified way to access containers.

It achieves the effect of traversal interoperability without needing to care about the container’s underlying structure and implementation details.

It disregards the underlying layer (which has already been encapsulated for different containers) and can be used directly.

namespace yyyy
{
	class string
	{
	public:
		typedef char* iterator;//Encapsulate an Iterator
		iterator begin()
		{
			return _str;
		}
		iterator end()
		{
			return _str + _size;
		}//Implementing iterators: iterators essentially simulate the behavior of pointers
		
		string()
			:_str(new char[1]{'\0'})//Cannot directly give a null pointer, at least one character is given
			,_size(0)
			,_capacity(0)
		{}

		//Short and frequently called functions can be directly defined into the class, with the default being inline
		string(const char* str = "")//Nuplltr cannot be given a null pointer here
		{
			_size = strlen(str);
			//_Capacity does not include \0
			_capacity = _size;
			_str = new char[_capacity + 1];
			strcpy(_str, str);//Copy string, even \0 will be copied
		}

		~string()
		{
			delete[] _str;
			_str = nullptr;
			_size = _capacity = 0;
		}

		const char* c_str() const
		{
			return _str;
		}

		size_t size() const
		{
			return _size;
		}

		char& operator[](size_t pos)
		{
			assert(pos < _size);//Assert and prevent crossing boundaries
			return _str[pos];
		}

		const char& operator[](size_t pos) const
		{
			assert(pos < _size);//Assert and prevent crossing boundaries
			return _str[pos];
		}
	private:
		char* _str;//Point to the first address of the string space
		size_t _size;//The length of a string
		size_t _capacity;//The capacity of a string
	};
}

This completes our simplest simulation implementation.