Blog

优化大量整数的读入

Content #

void read(int &x){
    char ch = getchar();x = 0;
    for (; ch < '0' || ch > '9'; ch = getchar());
    for (; ch >='0' && ch <= '9'; ch = getchar()) x = x * 10 + ch - '0';
}

这段代码实现了一个简单的读入整数的函数read,它通过getchar()函数逐个字符地读取输入,直到读取到有效的数字字符,并将这些字符转换为整数。相比于 scanf或cin,这种读入优化的主要优点在于:

  1. 避免缓冲区同步:

scanf和cin在读取输入时可能需要与标准输入缓冲区同步,这可能会引入额外的开销。 getchar()直接从输入缓冲区中读取字符,不需要额外的同步操作。

  1. 避免格式字符串解析:

scanf需要解析格式字符串(如"%d"),这也会消耗一定的时间。 read函数没有格式字符串,它直接处理输入字符,因此避免了这一开销。

  1. 减少输入错误处理:

scanf和cin在遇到非法输入时(如非数字字符)可能会进入错误状态,需要额外处理。 read函数通过循环直接跳过非数字字符,只处理有效的数字字符,因此更健壮,且不需要额外的错误处理。

  1. 可定制性:

read函数是一个自定义的读入函数,你可以根据需要修改它以适应不同的输入格式或进行更复杂的输入处理。相比之下,scanf和cin的功能更为固定,不够灵活。

  1. 对于大数据量输入效率更高:

当需要处理大量输入数据时,read函数通常比scanf和cin更快,因为它避免了额外的同步和格式字符串解析开销。

然而,这种读入优化方法也有其局限性:

它只适用于读取整数类型的数据。如果你需要读取其他类型的数据(如浮点数、字符串等),你可能需要编写额外的读入函数。

如果输入数据中存在大量非数字字符,read函数可能会花费较多时间跳过这些字符。在这种情况下,可能需要结合其他优化方法(如预处理输入数据)来提高效率。

From #

https://acm.hdu.edu.cn/showproblem.php?pid=6375

题目格式(Moodle)

格式 #

Aiken #

比较直观,但容易出错,只能用于创建多选题。参考网址: http://docs.moodle.org/23/en/Aiken_Format

GIFT #

GIFT格式的文件是普通文本文件,支持的题目类型有:

  • multiple-choice,
  • true-false,
  • short answer,
  • matching missing word and numerical questions

录入中文题时注意使用UTF-8格式。参考网址: http://docs.moodle.org/23/en/GIFT。在线生成GIFT格式文件的工具: http://a4esl.org/c/qw.html

Missing Word Question #

这种格式可用于多项选择题和简答题。参考网址: http://docs.moodle.org/23/en/Missing_word_question_format 根据官方文档的描述,目前对于多选题在导入时会有问题。

You can use the missing word format to import questions into both Moodle's Question bank and {=Lesson} activity.

From #

pg_hba.conf

Syntax #

<connection-type> <database> <role> <remote-machine> <auth-method>
  1. connection-type

The type of connection supported by PostgreSQL and is either: local (meaning via operating system sockets), host (TCP/IP connection, either encrypted or not), hostssl (TCP/IP encrypted only connection), nohostssl (TCP/IP non-encrypted connections).

  1. database

The name of a specific database that the line refers to or the special keyword all, which means every available database. The special replication keyword is used to handle a special type of connection used to replicate the data to another cluster.

...

Managing Roles(pgsql)

Content #

Create Role #

in order to be allowed to interactively log in, the role must also have the LOGIN option:

CREATE ROLE luca WITH LOGIN PASSWORD 'xxx';
CREATE ROLE luca WITH PASSWORD 'xxx' LOGIN;

Define expired date:

CREATE ROLE luca WITH LOGIN PASSWORD 'xxx' VALID UNTIL '2030-12-25 23:59:59';

Using a role as as a group #

A group is a role that contains other roles. Usually, when you want to create a group, all you need to do is create a role without the LOGIN option and then add all the members one after the other to the containing role. Adding a role to a containing role makes the latter a group.

...

Configuration files and parameters(pgsql)

Content #

postgresql.conf #

Every configuration parameter is associated with a context, and depending on the context, you can apply changes with or without a cluster restart. Available contexts are as follows:

  1. internal:

A group of parameters that are set at compile time and therefore cannot be changed at runtime.

  1. postmaster:

All the parameters that require the cluster to be restarted (that is, to kill the postmaster process and start it again) to activate them.

...

Tablespace(pgsql)

Content #

A tablespace is a directory that can be outside the PGDATA directory and can also belong to different storage. Tablespaces are mapped into the PGDATA directory by means of symbolic links stored in the pg_tblspc subdirectory. In this way, the PostgreSQL processes do not have to look outside PGDATA, but are still able to access “external” storage.

A tablespace can be used to achieve different aims, such as enlarging the storage data or providing different storage performances for specific objects. For instance, you can create a tablespace on a slow disk to contain infrequently accessed objects and tables, keeping fast storage within another tablespace for frequently accessed objects.

...

Objects in the PGDATA directory

Content #

PostgreSQL does not name objects on disk, such as tables, in a mnemonic or human-readable way; instead, every file is named after a numeric identifier.

Internally, PostgreSQL holds a specific catalog that allows the database to match a mnemonic name to a numeric identifier, and vice versa. The integer identifier is named OID (or, Object Identifier); this name is a historical term that today corresponds to the so-called filenode.

...

Disk Layout of PGDATA

Content #

The main files are as follows:

  1. postgresql.conf

the main configuration file, used by default when the service is started.

  1. postgresql.auto.conf

the automatically included configuration file used to store dynamically changed settings via SQL instructions.

  1. pg_hba.conf

the HBA file that provides the configuration regarding available database connections.

  1. PG_VERSION

a text file that contains the major version number (useful when inspecting the directory to understand which version of the cluster has managed the PGDATA directory).

...

The template databases

Content #

initdb执行完成后,会有template0和template1,这两个template数据库被用于创建新的数据库。

最先创建的是template1,然后由template1克隆成template0,template0一般用作安全备份。template0是不接收连接的。

用下面的命令查看数据库列表:

psql -l

postgres database is a common space to be used for connections instead of the template databases.

From #

Processes(PostgreSQL)

Content #

$ ps -C postgres -af
  1. checkpointer

the process responsible for executing the checkpoints, which are points in time where the database ensures that all the data is actually stored persistently on the disk.

  1. background writer

responsible for helping to push the data out of the memory to permanent storage.

  1. walwriter

responsible for writing out the Write-Ahead Logs (WALs), the logs that are needed to ensure data reliability even in the case of a database crash.

...