传统文化修养丛书续三-楹联新话(三种) pdf mobi txt 2024 电子版 下载
传统文化修养丛书续三-楹联新话(三种)电子书下载地址
内容简介:
本书作者本书是朱应镐、陈方镛、雷瑨。此三人为晚清民国初年的联话作者。收录三人所著,辑录为“传统文化修养丛书续三”,《楹联丛话》之后,仿作颇多,除吴恭亨《对联话》之外,单是为《楹联新话》的就不下数种。本集收录晚清民国时期作者的《楹联新话》三种,体例与《丛话》相仿,资料却多有《丛话》不备者,如雷瑨《新话》,资料从鸦片战争到民国初年,笔记、野史以及报刊搜集,人称“靡不备载”,可称丰富。本书除繁简横排之外,内容体式均仍其旧,对于文字,明显的误植鱼鲁改正,也做了方便阅读和版面整齐的适当处理。
书籍目录:
暂无相关目录,正在全力查找中!
作者介绍:
暂无相关内容,正在全力查找中
出版社信息:
暂无出版社相关信息,正在全力查找中!
书籍摘录:
暂无相关书籍摘录,正在全力查找中!
在线阅读/听书/购买/PDF下载地址:
原文赏析:
暂无原文赏析,正在全力查找中!
其它内容:
书籍介绍
本书作者本书是朱应镐、陈方镛、雷瑨。此三人为晚清民国初年的联话作者。收录三人所著,辑录为“传统文化修养丛书续三”,《楹联丛话》之后,仿作颇多,除吴恭亨《对联话》之外,单是为《楹联新话》的就不下数种。本集收录晚清民国时期作者的《楹联新话》三种,体例与《丛话》相仿,资料却多有《丛话》不备者,如雷瑨《新话》,资料从鸦片战争到民国初年,笔记、野史以及报刊搜集,人称“靡不备载”,可称丰富。本书除繁简横排之外,内容体式均仍其旧,对于文字,明显的误植鱼鲁改正,也做了方便阅读和版面整齐的适当处理。
精彩短评:
作者:Ina小猫儿 发布时间:2022-11-14 10:14:01
心比天高,命比纸薄。另外我就不明白,感情破裂都提离婚了还能怀孕。
作者:御手洗黑 发布时间:2020-12-25 18:10:18
四星给情怀,如果你读岛田比较多的话其实一下就能“猜”到诡计(并非老梗新用,只是异想天开)。说下问题,开篇写得幽灵军团没有给一个好的解释(精神错乱就胡乱解释?),第二个涉及剧透就不便说了。即使这样,最后的温馨依旧让人感动。
作者:财神5.0 发布时间:2021-11-29 14:42:00
楹联消遣读物。
作者:大熊乖乖 发布时间:2023-07-28 21:19:14
平淡无奇,正念饮食
作者:銀ガラス_YJ 发布时间:2017-10-03 09:34:32
实际上还没看顶级技巧部分,但并不碍我对本书的推崇。如果你喜爱玩数独,但又总被难题轻易拦住,那么就算不是打算成为专业选手,本书也绝对能帮你体验到更多的数独乐趣。顶级技巧也许比较难掌握,但至少应该掌握高级技巧篇,那些都是必不可少的实用工具。
作者:热带夜 发布时间:2019-12-10 01:35:59
抱歉了各位,这本于我读之如高数。相比福柯的博论,德勒兹早期文风属实晦涩,而这不该仅仅是读者的问题,如果一种写作难以抵达大多数受众,它的意义何在?
深度书评:
谈一谈《播弄》—— 最好的爱情,最坏的爱情
作者:我累特 发布时间:2021-04-07 09:02:06
短篇以莎士比亚的戏剧为引子,又以莎士比亚式荒诞为结尾。相比于大受赞誉的第一篇→逃离,更钟情于这篇《播弄》,女主若冰着墨不多,但应该是她书中为数不多的,拥有坚韧的自我意识,且完全不依附任何男人,活出自我的女人。
《播弄》的故事其实很简单:
一个多愁善感的聪慧女孩→若冰,每年夏天都会离开家一次,坐火车去附近的镇上,只为看一出莎士比亚的剧。
在某一次看剧的途中,因为她太聚精会神,把手包落在剧院里,准备吃晚餐的时候才发现手包丢了,急忙回去找的时候撞到了一个男人和他的狗
这个男人知道了她的情况之后,好心提供帮助,把若冰带回了自己的店里【他开了一个钟表店】,不仅温柔地与她交谈,还给她做了一顿颇有异域特色的饭,这个男人是个外国人,叫丹尼尔,来自南斯拉夫。
两人吃完饭后,他将若冰送到火车站,在分离的时刻两人接吻了,男人和若冰约定好,一年之后如果若冰还有同样的心情,就来同样的地方,穿着同样的衣服,他们再见一面。【说实话看到这里的时候简直以为这就是爱在系列的雏形了
若冰没谈过恋爱,在别人眼里她对恋爱之类的事情很淡漠,不断错过身边的男性,现在被这突如其来的邂逅撞晕了,在此后的一年里,都隐秘而快乐的带着这个秘密生活,她开始关注这个遥远的国家,阅读这个国家的历史,专注于一些离她很遥远的消息和名字。
终于一年之后,她又穿着绿色裙子回到了那家店门口,却看到了冷漠的丹尼尔,像不认识她似的,最后他直接把大门紧闭,将若冰拒之门外。(看到这里我以为又是一个典型的,最后小丑竟是我自己的故事
若冰一路哭着回到家,被姐姐嘲笑,从那一刻起她打定主意,将所有的感情投入工作,几十年后成了一名精神科医生,在那之后她也随着自己的心,没有结婚,拒绝和任何没有感觉的人相亲,她随一些外国人同居过,也和有妇之夫相处过,但她对这些人并没有很浓烈的感情。
再后来,医院送来三个老人,需要额外检查,若冰发现其中一个躺在床上瘦弱不堪的老人,竟是很多年前那个把门摔在她面前的,让她魂牵梦绕了很久的男人,但她马上就发现他的资料上面名字写着→亚历山大
原来他的名字不叫丹尼尔,也许这个人的一切都是假的,若冰陷入无尽的懊恼,她找工作人员要了一份这个人的资料,跑回家匆匆读了起来。这时她才发现命运跟她开了个巨大的玩笑。
躺在床上的这个人并不是那个曾经和她接过吻,约定过时间的人,而是他的双胞胎弟弟,正如若冰需要照顾自己体弱多病的姐姐乔安妮一样,那个人也有一个双胞胎弟弟,是聋哑人且自闭,不愿与人交流,那天若冰看到的并不是丹尼尔,是他的弟弟亚历山大,而丹尼尔已经在数十年前去世了。
因为不想与人交流而把门紧紧关上的人并不是丹尼尔,若冰感觉自己正身在莎士比亚的一出荒诞剧中,被双胞胎这个典型的作弊元素打的措手不及。
丹尼尔当时为什么不在店里?他去遛狗了吗?若冰如果穿的是另一件衣服,或者她坚持把戏看完,时间更长一点,或者她下车直接去找他,也许一切都不一样了,但他们只有短短的一面之缘,还来不及彼此了解,如果那天他们如约相遇了,若冰的命运会因此改变吗?她不得而知了。
非常喜欢这部短篇的一个原因就是它通过几个很平和简短的段落,描写出了一个在时代变迁中一直追随本心,非常坚强,有自主意识的女性
从一开始若冰坚持要去离家很远的地方看戏,为了看戏本身就能看出这个姑娘对莎士比亚和戏剧本身有敏锐的理解力,她看的第一部戏是李尔王,这部悲剧让她在接下来一天里都陷入沉思;并且决定以后都要独自一人来
我在读到前半部分的时候,以为这又是一个《包法利夫人》的故事,很快就发现并不是这样,若冰只是希望能够体面的,自由的出门,获得一些自己的时间,她当然也有虚荣心,她对美好生活有着一些幼稚的,碎片化的想象,比如她想吃饭的地方有雪白的桌布,阳光,桌上要插着鲜花。
但是她的想象中从来都没有出现过男人,她不需要男人的拯救,这个突然出现了外国男人给了若冰一种前所未有的恋爱的体验,这种爱意本身其实夹杂着对他身为异乡人的憧憬,若冰渴望从这个人身上了解更远的世界,那个她没有能力去闯荡的地方。
书中并没有过多描写若冰为什么一直单身,而更倾向于让若冰的单身作为她的一种模糊而坚定的自我选择,她希望保持单身,因为这让她很自在,这个突然出现又时隔一年不能见面的男人正合了若冰的心思,如果是另外一个女孩儿,并不见得会答应,但若冰喜欢的就是这种远距离恋爱,可以有空间思考,想象,可以从另外一个角度去了解这个人,这一年甚至一眨眼就过去了。
当被他的弟弟当面摔门后,若冰也没有立刻崩溃,一蹶不振,或者是诅咒,觉得每个男人都不是好的,她在这次浅尝辄止中发现了自己真正的热情→ 投身于工作,成了一名非常有能力的精神科医生。短篇后半段对若冰的描写非常有意思,随着时代的转变,女性开始穿更宽松的衣服,梳更短的头发,单身女性吃饭或者上剧院也不是什么稀罕事了→正如若冰很多年前就做的那样。
但若冰还是赶在了时代前面,她与男性的性行为甚至都不是出于某种期待或者是情感上的依赖,有的是善意,有的是纯粹体验欢愉的感觉。
用现在的大白话说说就是走肾玩玩,由于大批外国务工和移民的涌入,若冰半刻意的交往了很多外国人,也未知她是想通过这样了解世界呢,还是对某种感情的执着。
但是她还是在等待一个答案,那个曾经让她见了一面的南斯拉夫人为什么要将她拒之门外,但她得到真相,才发现是命运对她开的一个玩笑,两人竟是双胞胎,时隔一年,模糊了形象,若冰当然无法分辨容貌和打扮那么相似的两个人,若冰没想到这种小概率的事情发生在自己身上,也想到了命运,如果当初她稍微改变了任何一件事,她的人生会不一样吗?我觉得未必,若冰一直在强调她不会嫁给一个外国人,可是至少她会有一段令人动容的爱情吧。
与包法利夫人同样,是跨过另一个城市去邂逅自己的情人,只是若冰的选择和包法利完全不同。这段感情严格的来说完全发生在她的脑海中,我觉得整本书所有女主角中,若冰最接近我周围的女性朋友,甚至我自己。若冰的缺陷非常明显,对感情有过高的幻想,实战经验不足,完全凭直觉与人相处,她轻信有好感的男性,轻易把自己置身于一段浪漫爱情当中,从来没有想象过这感情的荒谬之处,不过反过来说这也是若冰的优势
很难说我在这篇文章中看到的是一个真正能谈好恋爱的女人,还是一个注定会把自己爱情生活搞糟的人,但若冰真的太符合现代女性的状态了。
包括她的多疑,从一开始就细致入微的,一定要把所有事情查清楚的小心思,她想知道的真相,哪怕这个真相是刺痛的。非常喜欢这个短篇,在重重的一击之后,又把人拉回了一个圈,闻到一股烟熏火燎的奇特味道。我看了诸多解析,都很难理解爱丽丝门罗对爱情的定义到底是什么,但从这篇文章中反倒微微体会到了她对爱情的某些定义→和命运一样,爱情是如流水彩虹般时刻变化的东西,难以捉摸,有两副面孔,往往在改变行动的一瞬间,停下的一瞬间,决定前进的一瞬间,爱情就有了无数分叉路,可最坚定的那群人的爱情和命运往往顺遂了他们的意→本该如此才好,若冰也是一样。
“他们当时进入的是另外的一个世界,一如任何一个在舞台上虚构的世界。他们脆弱的安排,他们仪式般的接吻,鲁莽的信心主宰着它们,竟会一门心思的相信一切都会按照设想往前发展,这样危险的布局下,只要往这边或是往那边移动一分,事情便会落空,若冰的一些病人相信梳子与牙刷都必须摆在一定的位置,鞋子必须朝着正确的方向,迈的步子,应该不多不少,否则一定会遭到报应。如果她在这件事上未能成功,那必定是因为绿裙子的关系,由于洗衣店里的那个女人生病的孩子,她穿错了一条绿裙子。”
很难说这条绿裙子是真的命运播弄,或是老天早已知晓她渴望自由的内心。
逃离中的克拉克遇到了最坏的爱情,但若冰遇到的是最好的爱情,至少对她的人生和她的成长来说,是让她最自由的爱情
英文章节摘要 Github搬运
作者:衝 发布时间:2018-06-20 09:35:11
因为翻译不忍直视,搬运一份Github上的笔记来,
原文在此
。另注:因为原文中有些条目不是很连贯,有些摘抄没有什么意思,所以略有删节。
Prologue
Chapter 1: The Machine Learning Revolution
Chapter 2: The Master Algorithm
Chapter 3: Hume's Problem of Induction
Chapter 4: How Does Your Brain Learn?
Chapter 5: Evolution: Nature's Learning Algorithm
Chapter 6: In the Church of Reverend Bayes
Chapter 7: You Are What You Resemble
Chapter 8: Learnning Without a Teacher
Chapter 9: The Pieces Of The Puzzle Fall Into Place
Chapter 10: This Is The World Of Machine Learning
Prologue
But learning algorithms are artifacts that design other artifacts.
Symbolists
view learning as the inverse of
deduction
(推理) and take ideas from philosophy, psychology, and logic.
Connectionists
reverse engineer the brain and are inspired by neuroscience and physics.
Evolutionaries
simulate evolution on the computer and
draw on
(利用) genetics and evolutionary biology.
Bayesians
believe learning is a form of probabilistic inference and have their roots in statistics.
Analogizers
(类推学派) learn by
extrapolating
(推断) from similarity judgments and are influenced by psychology and mathematical optimization.
On the contrary, what it requires is stepping back from the mathematical
arcana
(奥秘) to see the
overarching
(能解释一切的) pattern of learning phenomena; and for this the
layman
(外行人), approaching the forest from a distance, is in some ways better placed than the specialist, already deeply immersed in the study of particular trees.
Chapter 1: The Machine Learning Revolution
Scientists make theories, and engineers make devices. Computer scientists make algorithms, which are both theories and devices.
Learning algorithms are the seeds, data is the soil, and the learned programs are the grown plants.
The Industrial Revolution automated manual work and the Information Revolution did the same for mental work, but machine learning automates automation itself. Without it, programmers become the bottleneck
holding up
(妨碍) progress.
In
retrospect
(回顾), we can see that the progression from computers to the Internet to machine learning was inevitable: computers enable the Internet, which creates a flood of data and the problem of limitless choice; and machine learning uses the flood of data to help solve the limitless choice problem.
Chapter 2: The Master Algorithm
All knowledge—past, present, and future—can be derived from data by a single, universal learning algorithm, which is called Master Algorithm.
Thus it seems that evolution kept the
cerebellum
(小脑) around not because it does something the
cortex
(皮层) can't, but just because it's more efficient.
If something exists but the brain can't learn it, we don't know it exists. We may just not see it or think it's random.
But if everything we experience is the product of a few simple laws, then it makes sense that a single algorithm can induce all that can be induced.
Biology, in turn, is the result of optimization by evolution within the constraints of physics and chemistry,
Humans are good at solving NP problems approximately, and conversely, problems that we find interesting (like Tetris) often have an "NP-ness" about them.
To use a technology, we don't need to master its inner workings, but we do need to have a good conceptual model of it.
The analogizers' master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions.
Chapter 3: Hume's Problem of Induction
The rationalist likes to plan everything in advance before making the first move. The empiricist prefers to try things and see how they turn out.
You could be super-
Casanova
(花花公子) and have dated millions of women thousands of times each, but your master database still wouldn't answer the question of what this woman is going to say this time. How about we just assume that the future will be like the past? This is certainly a risky assumption. (It didn't work for the inductivist turkey.) On the other hand, without it all knowledge is impossible, and so is life.
Result, known as the "no free lunch" theorem, sets a limit on how good a learner can be. The limit is pretty low:
No learner can be better than random guessing!
Now,you have the "no free lunch" theorem. Pick your favorite learner. For every world where it does better than random guessing, I will
deviously
(狡猾地)
construct
(构造) one where it does worse by the same amount. All I have to do is flip the labels of all unseen instances.
Tom Mitchell, a leading symbolist, calls it "the
futility
(无用性) of bias-free learning." In ordinary life, bias is a
pejorative
(贬义的) word:
preconceived
(预先形成的) notions are bad. But in machine learning, preconceived notions are indispensable; you can't learn without them. In fact, preconceived notions are also indispensable to human cognition, but they're hardwired into the brain, and we take them for granted. It's biases over and beyond those that are questionable.
Learning is forgetting the details as much as it is remembering the important parts. Learning is a race between the amount of data you have and the number of hypotheses you consider. More data
exponentially
(指数级的) reduces the number of hypotheses that survive, but if you start with a lot of them, you may still have some bad ones left at the end.
Accuracy on
held-out data
(留存数据,是我们通常的 validation set) is the gold standard in machine learning.
For example, we can subtract a
penalty
(惩罚) proportional to the length of the rule from its accuracy and use that as an evaluation measure.
The preference for simpler hypotheses is popularly known as
Occam's razor
(奥卡姆剃刀,简单性原则), but in a machine-learning context this is somewhat misleading. "Entities should not be multiplied beyond necessity," as the razor is often paraphrased, just means choosing the simplest theory that fits the data.
You can estimate the
bias and variance
of a learner by comparing its predictions after learning on random variations of the training set. If it keeps making the same mistakes, the problem is
bias
, and you need a more flexible learner (or just a different one). If there's no pattern to the mistakes, the problem is
variance
, and you want to either try a less flexible learner or get more data.
For each pair of facts, we construct the rule that allows us to infer the second fact from the first one and generalize it by Newton's principle. When the same general rule is induced over and over again, we can have some confidence that it's true.
This contrasts with traditional chemotherapy, which affects all cells
indiscriminately
(不加区分的). Learning which drugs work against which
mutations
(异变) requires a database of patients, their cancers' genomes, the drugs tried, and the outcomes. For these, the symbolist algorithm of choice is
decision tree induction
. Decision trees instead ensure a priori that each instance will be matched by exactly one rule.
A single concept implicitly defines two classes: the concept itself and its
negation
(否定). (For example, spam and nonspam.) Classifiers are the most widespread form of machine learning.
So to learn a good decision tree, we pick at each node the attribute that on average yields the lowest class entropy across all its branches, weighted by how many examples go into each branch.
The psychologist David Marr argued that every information processing system should be studied at three distinct levels: the fundamental properties of the problem it's solving; the algorithms and representations used to solve it; and how they are physically implemented.
Sets of rules and decision trees are easy to understand, so we know what the learner is up to. This makes it easier to figure out what it's doing right and wrong, fix the latter, and have confidence in the results.
Connectionists, in particular, are highly critical of symbolist learning. According to them, concepts you can define with logical rules are only the tip of the iceberg; there's a lot going on under the surface that formal reasoning just can't see, in the same way that most of what goes on in our minds is subconscious.
Chapter 4: How Does Your Brain Learn?
Brains can perform a large number of computations in parallel, with billions of neurons working at the same time; but each of those computations is slow, because neurons can fire at best a thousand times per second. Some neurons have short axons and some have exceedingly long ones, reaching clear from one side of the brain to the other. Placed end to end, the axons in your brain would stretch from Earth to the moon.
If one of the memories is the pattern of black-and-white pixels formed by the digit nine and the network sees a distorted nine, it will converge to the "ideal" one and thereby recognize it.
Rather than a logic gate, a neuron is more like a voltage-to-frequency converter. The curve of frequency as a function of voltage looks like this:
Hyperspace
is a double-edged sword. On the one hand, the higher dimensional the space, the more room it has for highly convoluted surfaces and local optima. On the other hand, to be stuck in a local optimum you have to be stuck in every dimension, so it's more difficult to get stuck in many dimensions than it is in three.
Neural networks are not compositional, and compositionality is a big part of human cognition. Another big issue is that humans—and symbolic models like sets of rules and decision trees—can explain their reasoning, while neural networks are big
piles
(堆叠) of numbers that no one can understand.
Chapter 5: Evolution: Nature's Learning Algorithm
The key input to a genetic algorithm, as Holland's creation came to be known, is a
fitness function
(适度函数:
A fitness function is a particular type of objective function that is used to summarise how close a given design solution is to achieving the set aims. Fitness functions are used in genetic algorithms to guide simulations towards optimal design solutions
). Given a candidate program and some purpose it is meant to fill, the fitness function assigns the program a numeric score reflecting how well it fits the purpose.
Which Holland called classifier systems, are one of the workhorses of the
machine-learning tribe
(机器学习部落): the evolutionaries. Like multilayer perceptrons, classifier systems face the credit-assignment problem (what is the fitness of rules for intermediate concepts?) and Holland devised the so-called bucket brigade algorithm to solve it.
In 1972, Niles Eldredge and Stephen Jay Gould proposed that evolution consists of a series of "
punctuated equilibria
" (间断平衡) alternating long periods of stasis with short bursts of rapid change (在长期的静止和短暂的迅速爆发之间轮流交换).
Once the algorithm reaches a local maximum of fitness—a peak in the fitness landscape—it will stay there for a long time until a lucky mutation (突变) or crossover lands an individual on the slope to a higher peak, at which point that individual will multiply and climb up the slope with each passing generation. And the higher the current peak, the longer before that happens.
Genetic algorithms are full of random choices: which hypotheses to keep alive and cross over (with fitter hypotheses being more likely candidates), where to cross two strings, which bits to mutate.
Genetic algorithms make no a priori assumptions about the structures they will learn, other than their general form.
Holland showed that the fitter a schema's representatives in one generation are compared to the average, the more of them we can expect to see in the next generation. So, while the genetic algorithm explicitly manipulates strings, it implicitly searches the much larger space of schemas.
A genetic algorithm is like the leader of a group of gamblers, playing
slot machines
(老虎机) in every casino in town at the same time. Two schemas compete with each other if they include the same bits and differ in at least one of them, like *10 and *11, and n competing schemas are like n slot machines. Every set of competing schemas is a casino, and the genetic algorithm simultaneously figures out the winning machine in every casino, following the optimal strategy of playing the better-seeming machines with exponentially increasing frequency.
One consequence of crossing over program trees instead of bit strings is that the resulting programs can have any size, making the learning more flexible. The overall tendency is for bloat, however, with larger and larger trees growing as evolution goes on longer (also known as "survival of the fattest").
Genetic programming's first success, in 1995, was in designing electronic circuits. Starting with a pile of electronic components such as transistors, resistors, and capacitors, Koza's system reinvented a previously patented design for a low-pass filter, a circuit that can be used for things like enhancing the bass on a dance-music track.
None of Holland's theoretical results show that crossover (杂交) actually helps; mutation suffices to exponentially increase the frequency of the fittest schemas in the population over time.
Engineers certainly use building blocks extensively, but combining them involves, well, a lot of engineering; it's not just a matter of throwing them together any old way, and it's not clear crossover can do the trick.
"It takes all the running you can do, to keep in the same place." In this view, organisms are in a perpetual (无休止的) arms race with parasites (寄生虫), and sex helps keep the population varied, so that no single germ can infect all of it.
Christos Papadimitriou and colleagues have shown that sex optimizes not fitness but what they call mixability: a gene's ability to do well on average when combined with other genes. This can be useful when the fitness function is either not known or not constant, as in natural selection, but in machine learning and optimization, hill climbing tends to do better.
With or without crossover, evolving structure is an essential part of the Master Algorithm. The brain can learn anything, but it can't evolve a brain. The Master Algorithm is neither genetic programming nor backprop, but it has to include the key elements of both: structure learning and weight learning.
In Baldwinian evolution, behaviors that are first learned later become genetically hardwired. If dog-like mammals (哺乳动物) can learn to swim, they have a better chance to evolve into seals (海豹) —as they did—than if they drown (淹死).
Chapter 6: In the Church of Reverend Bayes
For Bayesians, learning is "just" another application of Bayes' theorem, with whole models as the hypotheses and the data as the evidence: as you see more data, some models become more likely and some less, until ideally one model stands out as the clear winner.
Laplace derived his so-called rule of succession, which estimates the probability that the sun will rise again after having risen n times as (n+1)/(n+2). When n=0, this is just 1/2; and as n increases, so does the probability, approaching 1 when n→∞.
Humans are not very good at Bayesian inference, at least when verbal reasoning is involved. The problem is that we tend to neglect the cause's prior probability.
I put just in quotes because implementing Bayes' theorem on a computer turns out (最终变成) to be fiendishly (极其困难地) hard for all but the simplest problems, for reasons that we're about to see.
Each combination of symptoms and flu/not flu. A learner that uses Bayes' theorem and assumes the effects are independent given the cause is called a
Naïve Bayes classifier
.
It might not seem so at first, but Naïve Bayes is closely related to the perceptron algorithm. The perceptron adds weights and Naïve Bayes multiplies probabilities, but if you take a logarithm, the latter reduces to the former. Both can be seen as generalizations of simple
If ..., then ...
rules.
If the states and observations are continuous variables instead of discrete ones, the HMM becomes what's known as a Kalman filter.
A more insidious (潜在的) problem is that with confidence-rated rules we're prone to double-counting evidence.
Everything is connected, but only indirectly.
In order to affect me, something that happens a mile away must first affect something in my neighborhood, even if only through the propagation of light. As one wag put it, space is the reason everything doesn't happen to you. Put another way, the structure of space is an instance of conditional independence.
Naïve Bayes, Markov chains, and HMMs are all special cases of Bayesian networks.
The structure of Naïve Bayes is:
The trick in MCMC (Markov Chain Monte Carlo) is to design a Markov chain that converges to the distribution of our Bayesian network. One easy option is to repeatedly cycle through the variables, sampling each one according to its conditional probability given the state of its neighbors. People often talk about MCMC as a kind of simulation, but it's not: the Markov chain does not simulate any real process; rather, we
concocted
(虚构) it to efficiently generate samples from a Bayesian network, which is itself not a sequential model.
This is justified by the so-called
maximum likelihood principle
: the likelihood of a hypothesis is P(data | hypothesis), and the principle says we should pick the hypothesis that maximizes it. For a Bayesian, in fact, there is no such thing as the truth; you have a prior distribution over hypotheses, after seeing the data it becomes the posterior distribution, as given by Bayes' theorem, and that's all.
Bayesians can do something much more interesting. They can use the prior distribution to encode experts' knowledge about the problem—their answer to Hume's question. For example, we can design an initial Bayesian network for medical diagnosis by interviewing doctors, asking them which symptoms they think depend on which diseases, and adding the corresponding arrows. This is the "prior network," and the prior distribution can penalize alternative networks by the number of arrows that they add or remove from it.
We can put a prior distribution on any class of hypotheses—sets of rules, neural networks, programs—and then update it with the hypotheses' likelihood given the data. The simplified graph structure makes the models learnable and is worth keeping, but then we're better off just learning the best parameters we can for the task at hand, irrespective (不受...的影响) of whether they're probabilities.
In Markov networks we can also learn features using hill climbing, similar to rule induction. Either way, gradient descent is a good way to learn the weights. Markov networks can be
trained
to maximize either
the likelihood of the whole data
or
the conditional likelihood of what we want to predict given what we know
. For Siri, the likelihood of the whole data is P(words, sounds), and the conditional likelihood we're interested in is P(words | sounds). By optimizing the latter, we can ignore P(sounds), which is only a distraction from our goal. And since we ignore it, it can be arbitrarily complex. This is much better than HMMs' unrealistic assumption that sounds depend solely on the corresponding words, without any influence from the surroundings.
Bayesian learning works on a single table of data
, where each column represents a variable (for example, the expression level of one gene) and each row represents an instance (for example, a single microarray experiment, with each gene's observed expression level). It's OK if the table has "holes" and measurement errors because we can use probabilistic inference to fill in the holes and average over the errors.
But if we have more than one table, Bayesian learning is stuck.
It doesn't know how to, for example, combine gene expression data with data about which DNA segments get translated into proteins, and how in turn the three-dimensional shapes of those proteins cause them to lock on to different parts of the DNA molecule, affecting the expression of other genes. In logic, we can easily write rules relating all of these aspects, and learn them from the relevant combinations of tables—but only provided the tables have no holes or errors.
All of the tribes we've met so far have one thing in common: they learn an explicit model of the phenomenon under consideration, whether it's a set of rules, a multilayer perceptron, a genetic program, or a Bayesian network. When they don't have enough data to do that, they're
stumped
(被难住). But analogizers can learn from as little as one example because they never form a model.
Chapter 7: You Are What You Resemble
Nearest-neighbor Algorithm, Support Vector Mechine and .
Nearest-neighbor
is the simplest and fastest learning algorithm ever invented. In fact, you could even say it's the fastest algorithm of any kind that could ever be invented.
Scientists routinely use linear regression to predict continuous variables, but most phenomena are not linear. Luckily, they're locally linear because smooth curves are locally well approximated by straight lines. So if instead of trying to fit a straight line to all the data, you just fit it to the points near the query point, you now have a very powerful nonlinear regression algorithm.
These days all kinds of algorithms are used to recommend items to users, but
weighted k-nearest-neighbor
was the first widely used one, and it's still hard to beat.
Nearest-neighbor was the first algorithm in history that could take advantage of unlimited amounts of data to learn arbitrarily complex concepts.
Nearest-neighbor is hopelessly confused by irrelevant attributes
because they all contribute to the similarity between examples. With enough irrelevant attributes, accidental similarity in the irrelevant dimensions swamps out meaningful similarity in the important ones, and nearest-neighbor becomes
no better than random guessing
.
It gets even worse. Nearest-neighbor is based on finding similar objects, and in high dimensions, the notion of similarity itself breaks down. Consider an orange: a tasty ball of pulp surrounded by a thin shell of skin. Let's say 90 percent of the radius of an orange is occupied by pulp, and the remaining 10 percent by skin. That means 73 percent of the volume of the orange is pulp (0.93). Now consider a hyperorange: still with 90 percent of the radius occupied by pulp, but in a hundred dimensions, say. The pulp has shrunk to only about three thousandths of a percent of the hyperorange's volume (0.9100). The hyperorange is all skin, and you'll never be done peeling it!
With a high-dimensional normal distribution, you're more likely to get a sample far from the mean than close to it. A bell curve in hyperspace looks more like a doughnut than a bell.
In fact, no learner is immune to
the curse of dimensionality
. It's the second worst problem in machine learning, after
overfitting
.
To handle weakly relevant attributes, one option is to learn attribute weights. Instead of letting the similarity along all dimensions count equally, we "shrink" the less-relevant ones. Data is not spread uniformly in (hyper) space. The examples may have a thousand attributes, but in reality they all "live" in a much lower-dimensional space.
The SVM chooses the support vectors and weights that yield the maximum possible margin.
We have to maximize the margin under the constraint that the weights can only increase up to some fixed value. Or, equivalently, we can minimize the weights under the constraint that all examples have a given margin, which could be one—the precise value is arbitrary. This is what SVMs usually do.
SVMs can be seen as a generalization of the perceptron, because a hyperplane boundary between classes is what you get when you use a particular similarity measure (the dot product between vectors). SVMs have a major advantage compared to multilayer perceptrons:
the weights have a single optimum instead of many local ones and so learning them reliably is much easier.
Provided you can learn them, networks with many layers can express many functions more compactly than SVMs, which always have just one layer, and this can make all the difference.
It turns out that we can view what SVMs do with kernels, support vectors, and weights as mapping the data to a higher-dimensional space and finding a maximum-margin hyperplane in that space. For some kernels, the derived space has infinite dimensions, but SVMs are completely unfazed by that.
Structure mapping
takes two descriptions, finds a coherent correspondence between some of their parts and relations, and then, based on that correspondence, transfers further properties from one structure to the other.
The problem is that all the learners we've seen so far need a teacher to tell them the right answer. They can't learn to distinguish tumor cells from healthy ones unless someone labels them "tumor" or "healthy." But humans can learn without a teacher; they do it from the day they're born.
Chapter 8: Learnning Without a Teacher
Above all, even though children certainly get plenty of help from their parents, they learn mostly on their own, without supervision, and that's what seems most miraculous.
Whenever we want to learn a statistical model but are missing some crucial information (e.g., the classes of the examples), we can use
EM (Expectation Maximization)
.
You might have noticed a certain resemblance between
k-means
and
EM
, in that they both alternate between assigning entities to clusters and updating the clusters' descriptions. This is not an accident: k-means itself is a special case of EM, which you get when all the attributes have "narrow" normal distributions, that is, normal distributions with very small variance.
One of the most popular algorithms for nonlinear dimensionality reduction, called
Isomap
, connects each data point in a high-dimensional space to all nearby points, computes the shortest distances between all pairs of points along the resulting network and finds the reduced coordinates that best approximate these distances.
In effect,
reinforcement learning
is a kind of speeded-up evolution—trying, discarding, and refining actions within a single lifetime instead of over generations—and by that standard it's extremely efficient.
Chris Watkins sees many things children can do that reinforcement learners can't: solve problems, solve them better after a few attempts, make plans, acquire increasingly abstract knowledge. Luckily, we also have learning algorithms for these higher-level abilities, the most important of which is
chunking
. Crucially, grouping things into chunks allows us to process much more information than we otherwise could. A chunk in this sense has two parts: the
stimulus
(刺激物) (a pattern you recognize in the external world or in your short-term memory) and the response (the sequence of actions you execute as a result).
In
nonrelational learning
, the parameters of a model are tied in only one way: across all the independent examples (e.g., all the patients we've diagnosed). In
relational learning
, every feature template we create ties the parameters of all its instances.
Chapter 9: The Pieces Of The Puzzle Fall Into Place
Although it is less well known, many of the most important technologies in the world are the result of inventing a
unifier
(统一者), a single mechanism that does what previously required many. As it turns out, it's not hard to combine many different learners into one, using what is known as metalearning. Netflix, Watson, Kinect, and countless others use it, and it's one of the most powerful arrows in the machine learner's quiver (箭筒). It's also a stepping-stone to the deeper unification that will follow.
Bagging generates random variations of the training set by resampling, applies the same learner to each one, and combines the results by voting.
One of the cleverest metalearners is
boosting
, created by two learning theorists, Yoav Freund and Rob Schapire. Instead of combining different learners, boosting repeatedly applies the same classifier to the data, using each new model to correct the previous ones' mistakes. It does this by assigning weights to the training examples; the weight of each misclassified example is increased after each round of learning, causing later rounds to focus more on it.
As you approach it from a distance, you can see that the city is made up of three concentric circles, each bounded by a wall. The outer and by far widest circle is Optimization Town. Each house here is an algorithm.
Representation
is the formal language in which the learner expresses its models.
The symbolists' formal language is logic
, of which rules and decision trees are special cases.
The connectionists' is neural networks
.
The evolutionaries' is genetic programs
, including classifier systems.
The Bayesians' is graphical models
, an umbrella term for Bayesian networks and Markov networks.
The analogizers' is specific instances
, possibly with weights, as in an SVM.
Evaluation
is a scoring function that says how good a model is.
Symbolists use accuracy or information gain
.
Connectionists use a continuous error measure
, such as squared error, which is the sum of the squares of the differences between the predicted values and the true ones.
Bayesians use the posterior probability
.
Analogizers (at least of the SVM stripe) use the margin
. In addition to how well the model fits the data, all tribes take into account other desirable properties, such as the
model's simplicity
.
Optimization
is the algorithm that searches for the highest-scoring model and returns it. (学习的过程)
The symbolists' characteristic search algorithm is inverse deduction
.
The connectionists' is gradient descent
.
The evolutionaries' is genetic search
, including crossover and mutation.
The Bayesians
are unusual in this regard: they don't just look for the best model, but
average over all models, weighted by how probable they are.
To do the weighting efficiently, they use probabilistic inference algorithms like MCMC.
The analogizers
(or more precisely, the SVM mavens)
use constrained optimization to find the best model
.
Chapter 10: This Is The World Of Machine Learning
Eventually, we'll start talking about the employment rate instead of the unemployment one and reducing it will be seen as a sign of progress. People will seek meaning in human relationships, self-actualization, and spirituality, much as they do now. The need to earn a living will be a distant memory, another piece of humanity's barbaric past that we rose above.
Technology is the extended
phenotype
(表现型) of man. This means we can continue to control it even if it becomes far more complex than we can understand. People worry that computers will get too smart and take over the world, but the real problem is that they're too stupid and they've already taken over the world.
The statistician knows that prediction is hard, especially about the future, and the computer scientist knows that the best way to predict the future is to invent it, but the unexamined future is not worth inventing.
网站评分
书籍多样性:4分
书籍信息完全性:7分
网站更新速度:8分
使用便利性:5分
书籍清晰度:7分
书籍格式兼容性:7分
是否包含广告:8分
加载速度:4分
安全性:9分
稳定性:4分
搜索功能:4分
下载便捷性:5分
下载点评
- 不亏(448+)
- 愉快的找书体验(523+)
- 内涵好书(118+)
- 书籍完整(261+)
- 格式多(639+)
- txt(582+)
- 超值(240+)
- 一星好评(563+)
- 小说多(511+)
- 少量广告(331+)
- 中评多(644+)
下载评价
- 网友 养***秋: ( 2024-12-19 16:01:14 )
我是新来的考古学家
- 网友 方***旋: ( 2024-12-25 18:40:20 )
真的很好,里面很多小说都能搜到,但就是收费的太多了
- 网友 康***溪: ( 2024-12-26 16:27:54 )
强烈推荐!!!
- 网友 隗***杉: ( 2024-12-27 13:01:39 )
挺好的,还好看!支持!快下载吧!
- 网友 冷***洁: ( 2024-12-12 12:45:02 )
不错,用着很方便
- 网友 曾***玉: ( 2024-12-13 10:40:50 )
直接选择epub/azw3/mobi就可以了,然后导入微信读书,体验百分百!!!
- 网友 曹***雯: ( 2024-12-16 22:18:53 )
为什么许多书都找不到?
- 网友 菱***兰: ( 2025-01-06 23:48:30 )
特好。有好多书
- 网友 焦***山: ( 2024-12-31 02:48:02 )
不错。。。。。
- 网友 冯***卉: ( 2025-01-03 12:25:49 )
听说内置一千多万的书籍,不知道真假的
- 网友 国***舒: ( 2024-12-15 04:33:44 )
中评,付点钱这里能找到就找到了,找不到别的地方也不一定能找到
- 网友 堵***格: ( 2024-12-18 17:28:50 )
OK,还可以
- 朗文常用英文词根词典(英英.英汉双解)(新) pdf mobi txt 2024 电子版 下载
- 2019年高考报考 全国普通高校录取分数线 pdf mobi txt 2024 电子版 下载
- 宏观经济学 pdf mobi txt 2024 电子版 下载
- 足本原著无障碍-楚辞汉赋(套装上下册) pdf mobi txt 2024 电子版 下载
- 本质或裸体 [法]于连 林志明、【正版图书】 pdf mobi txt 2024 电子版 下载
- 黄河全图 pdf mobi txt 2024 电子版 下载
- 播音员主持人训练手册 pdf mobi txt 2024 电子版 下载
- 旅行摄影圣经2:实战为王 pdf mobi txt 2024 电子版 下载
- 新东方 剑桥PET官方模考题精讲精练1(2020改革版) pdf mobi txt 2024 电子版 下载
- 现场勘查 pdf mobi txt 2024 电子版 下载
书籍真实打分
故事情节:9分
人物塑造:7分
主题深度:3分
文字风格:6分
语言运用:5分
文笔流畅:3分
思想传递:3分
知识深度:9分
知识广度:6分
实用性:8分
章节划分:4分
结构布局:4分
新颖与独特:9分
情感共鸣:6分
引人入胜:6分
现实相关:8分
沉浸感:6分
事实准确性:9分
文化贡献:3分